Dlt is an open-source library that automatically creates datasets out of messy, unstructured data sources. You can use the library to move data from about anywhere into most of well-known SQL and vector stores, data lakes, storage buckets, or local engines like DuckDB. It automates many cumbersome data engineering tasks and can be handled by anyone who knows Python.
We are looking for freelance help with building Iceberg support in dlt. We expect 3-4 weeks of full-time work and follow up projects in the future.
Tasks:
* Add Iceberg support to existing dlt destinations (Snowflake, BigQuery etc) with vendor supported mechanisms.
* Add support to read Iceberg data directly from Python via dlt supported interfaces (based on PyIceberg or DuckDB).
* Optionally to work on direct dlt support (writing data, catalog support, table maintenance).
Requirements
Ideal profile:
* you know dlt
* you know Iceberg ecosystem and tech table structures, metastores, transactions
* knows Snowflake and BigQuery
* knows PyIceberg and DuckDB
We have a pretty good understanding of what we want to build. We have a codebase (existing destinations, DuckDB scanner access, PyIceberg) to extend. You can take a look into dlt code.
You'll work on OSS and commercial projects.