
Dr. Pedro Holanda
Principal Engineer @ DuckDB Labs
Selected work: led DuckLake from its 0.1 pre-production version to 1.0, and built the CSV engine that ranks highest on the 2025 Pollock robustness benchmark across every tested system.
PhD from CWI/Leiden University with publications at VLDB and ICDE. Former Microsoft Research intern. Served as COO of DuckDB Labs, helping build the company from a CWI research spin-out before returning to engineering full-time.
DuckDB by the numbers
DuckLake by the numbers
About
I joined DuckDB at CWI Amsterdam when the project had two researchers, only a handful of users, and a bet that analytical databases could be radically simpler. I have been building its core infrastructure from the early days.
When DuckDB Labs spun out of CWI, I took on the COO role - helping hire the early team, organizing events, shaping the open-source strategy, and building the company while continuing to ship code. That experience gave me a perspective on the full lifecycle of open-source infrastructure that informs every design decision I make today. When the company was ready for dedicated operations leadership, I chose to return to engineering full-time.
As Principal Engineer, I led DuckLake from its 0.1 pre-production version to its 1.0 release - including data inlining that delivers 926× faster queries and 105× faster ingestion versus Iceberg - and I own query-processing performance for the engine. I mentor new contributors, lead design reviews across subsystems, and help set technical direction for the engine.
Engineering Principles
The future of analytical databases is in-process. Moving data to the database is the wrong abstraction - the database should come to the data. That is the bet I made when I joined DuckDB.
CSV will never die. Instead of replacing messy formats, build systems that handle their full complexity transparently. That philosophy drives the work I do on DuckDB's data ingestion layer.
Databases should meet users where they are. That is why I built DuckDB's zero-copy Arrow integration and ADBC - the best data system works seamlessly with every tool in your stack, not the other way around.
Engineering Contributions
DuckLake
Led the development of DuckDB's integrated data lake format from its 0.1 pre-production version to its 1.0 release. A SQL-native catalog that stores table metadata in any database while data lives in open formats like Parquet on object storage.
CSV Engine
Designed and built DuckDB's parallel CSV parser with automatic type, delimiter, and dialect detection. Scores highest on the Pollock robustness benchmark (2025) across all tested systems.
Arrow & ADBC Integration
Built the zero-copy integration between DuckDB and Python's data ecosystem via Apache Arrow. ADBC provides a modern connectivity standard that eliminates the serialization overhead of ODBC.
ART Persistent Storage
Designed and implemented the persistent storage layer for DuckDB's Adaptive Radix Tree indexes. Keeps index data durable on disk without sacrificing in-memory lookup speed.
Python Client
Built DuckDB's Python client foundations and the UDF framework that lets users extend the engine in pure Python - no C++ required.
BIGNUM Implementation
Implemented arbitrary-precision integer arithmetic for DuckDB. Handles HUGEINT and DECIMAL types so that financial calculations and scientific measurements stay exact beyond 64-bit limits.
Open Source Projects
DuckDB
38K+The open-source analytical in-process database. Early core contributor since the research prototype - 580+ merged pull requests.
View on GitHubDuckLake
2.7K+Integrated data lake and catalog format designed to work with DuckDB. Led its development - 180+ merged pull requests.
View on GitHubCareer Timeline
Principal Engineer
DuckDB LabsSoftware Engineer
DuckDB Labs Returned to full-time engineering. Built the CSV engine, ART storage, and ADBC integration.Chief Operating Officer
DuckDB Labs Helped hire the early team, shaped the open-source strategy, and built company operations while continuing to ship code.Post-Doctoral Researcher
CWI Amsterdam Concurrent with the COO role, during the CWI spin-out.Research Intern
Microsoft Research DMX group. JIT-compiled execution engines for SQL Server.PhD in Computer Science
CWI / Leiden UniversityTalks & Media
I speak on database internals and query-engine design at FOSDEM, EuroPython, and data community events.
Efficient CSV Parsing: On the Complexity of Simple Things
Selected Publications
Peer-reviewed work on progressive indexing, database benchmarking, and analytical query processing. The same research depth behind these VLDB and ICDE papers ships in DuckDB and DuckLake today.
Multidimensional Adaptive & Progressive Indexes
Progressive Indexes
Dissecting DuckDB: The internals of the SQLite for Analytics
Progressive Indexes: Indexing for Interactive Data Analysis
Fair Benchmarking Considered Difficult: Common Pitfalls In Database Performance Testing
Blog Posts
Showing selected posts. Full list on the DuckDB Blog and DuckLake Blog.
Introducing DuckLake's data inlining feature that stores small updates directly in the catalog database, eliminating the small files problem and achieving 926× faster queries and 105× faster ingestion compared to Iceberg.
Introduced the zero-copy integration between DuckDB and Apache Arrow that became the default way to move data between DuckDB and the Python ecosystem.
Testing DuckDB's CSV parser against the Pollock robustness benchmark - the most adversarial collection of real-world CSV files available.
A benchmark comparison of CSV and Parquet ingestion performance. The results are more nuanced than the conventional wisdom suggests.
DuckDB's implementation of the Arrow Database Connectivity standard, providing a modern alternative to ODBC for high-throughput data transfer.
Get in Touch
I am always up for a conversation about database internals, query engine design, or open-source collaboration. Feel free to reach out.
Get in Touch