Pedro Holanda

Principal Engineer · DuckLabs

I helped build DuckDB, an open-source database for analytics that gets downloaded over 45 million times a month. I joined in 2018, back when it was a research prototype at CWI Amsterdam, and I have worked on the core engine ever since. I built the CSV engine, the Arrow integration, the ART index, and the Python client. These days I lead DuckLake. I hold a PhD in database systems from CWI/Leiden and served as COO of DuckDB Labs.

Record

DuckDB

Core engine engineer since 2018

39K stars · 45M+ monthly PyPI downloads

DuckLake

Technical lead, spec co-author since 0.1, and release manager

2.8K stars

Stewardship

Merge rights across the duckdb organization

CSV engine

Designed and built DuckDB's parser

Figures from GitHub and PyPI, July 2026. Every number links to its source.

Systems

DuckLake: A data lake and catalog format built on SQL and Parquet. I led it from pre-production 0.1 to 1.0, co-writing the format specification since 0.1 and managing its releases. source
CSV engine: A parallel CSV parser that detects types, delimiters, and dialects on its own. The Pollock benchmark, designed independently of DuckDB, measured it as the most robust parser of any tested system: 9.6 out of 10 configured, 8.4 fully automatic. Our only joint work with the benchmark authors was adding DuckDB to it. source
Arrow & ADBC integration: The zero-copy Arrow bridge between DuckDB and the Python data ecosystem, plus DuckDB's driver for the ADBC standard. It has been the default way to move data in and out of the engine since 2021. source
Async I/O: Read-ahead infrastructure in DuckDB's I/O layer that overlaps fetching with compute, so scans over Parquet and object storage spend less time waiting on the network.
BIGNUM arithmetic: Arbitrary-precision integers for DuckDB, so values past the 64-bit limit stay exact instead of overflowing.
ART index: DuckDB's adaptive radix tree index. I implemented the original index and later its persistent on-disk storage, so it stays durable without losing in-memory lookup speed. source
Python client & UDFs: The foundations of DuckDB's most-used client, and the framework that lets users extend the engine in pure Python. source

Career

since 2025

Principal Engineer · DuckLabs (formerly DuckDB Labs)

Leads DuckLake and remains a major contributor to the DuckDB core, with work spanning async I/O, BIGNUM arithmetic, the Arrow integration, and other core functionality. Leads projects, coordinates design reviews, and mentors new contributors. One of the selected people with merge rights across the duckdb organization.

2023 – 25

Senior Software Engineer · DuckDB Labs

CSV engine, ART index, ADBC.

2022 – 23

Chief Operating Officer · DuckDB Labs

Ran hiring and training while continuing to ship code, mentoring interns who grew into senior engineers on the team. Once the company could grow dedicated management resources, I chose to fully return to engineering.

2021 – 22

Software Engineer · DuckDB Labs

Python client, zonemaps, ART index.

2021 – 22

Post-Doctoral Researcher · CWI Amsterdam

2019

Research Intern · Microsoft Research, DMX group

JIT-compiled execution engines for SQL Server.

2017 – 21

PhD, Computer Science · CWI / Leiden University

Thesis: Progressive Indexes.

Research

VLDB 2019 Progressive Indexes: Indexing for Interactive Data Analysis, by Pedro Holanda, Mark Raasveldt, Stefan Manegold, and Hannes Mühleisen code
ICDE 2021 Multidimensional Adaptive & Progressive Indexes, by Matheus Nerone, Pedro Holanda, Eduardo Almeida, and Stefan Manegold code
SIGMOD DBTest 2018 Fair Benchmarking Considered Difficult: Common Pitfalls in Database Performance Testing, by Mark Raasveldt, Pedro Holanda, Tim Gubner, and Hannes Mühleisen code
PhD Thesis 2021 Progressive Indexes, Leiden University / CWI

Writing

2026 Data Inlining in DuckLake: Unlocking Streaming for Data Lakes · DuckLake Blog
2025 Into the CSV Abyss: DuckDB and the Pollock Robustness Benchmark · DuckDB Blog
2024 CSV Files: Dethroning Parquet as the Ultimate Storage Format, or Not? · DuckDB Blog
2021 DuckDB quacks Arrow: Zero-Copy Integration with Apache Arrow · DuckDB Blog

More on the DuckDB blog and ducklake.select.

Talks

3.4K+ views

Interviews

2026 DuckDB uses RDBMS to reimagine the lakehouse · The Register
2023 DuckDB: All the Benefits of a Database, None of the Hassle · Inspiring Computing podcast

Contact

Always happy to talk about database internals, query engines, or open source. Write me at pedro@duckdblabs.com