Last week I’ve gathered a few links to posts, repos and videos I find particularly interesting and want to share.
Managed Iceberg tables in BigQuery with streaming ingestion from pubsub subscription! This looks awesome and enables a managed and serverless data lakehouse on Iceberg without spark. (In preview)
I am a fan of CLIs, ingestr (open source) is one that let you easily copy data between databases/data warehouses, enabled by dlt https://github.com/bruin-data/ingestr
I’ve done my fair share of coding streaming data ingestion to BigQuery as I’ve found existing solutions limited, however I had completely missed the release (January 22) of pubsub BigQuery subscription with BigQuery table schemas! Imagine managed streaming ingestion of JSON to native BigQuery tables, easy to automate the set up with Infrastructure as Code and data contracts! However, there is still one big issue GCP has to address, it doesn’t support date/time in ISO formats, only integers representing unix time. I have created a ticket in GCP issue tracker to fix this, please help voting for it!
Great walkthrough by Zach Wilson about what One Big Table models are and when to use them (and not). I like OBTs as they have great support in BigQuery, but they aren’t always the best option.
Salesforce data extraction was without question the messiest of all our sources and likely a reason on its own to use Fivetran only for that. Now BigQuery data transfer supports Salesforce, I can’t wait to try it out. Schedule a Salesforce transfer
The teams at my new employer have adopted Shape Up as project management methodology and I am excited to get their feedback and some own practical experience of it as I have never felt completely satisfied with the standard agile methods applied on data (platform) engineering. This is a great video overview/teaser of the methodology by Ryan Singer (Basecamp)
At my previous employer (Mathem) we built our own data contract CLI to automate, deploy and build models, pipelines and tests. This CLI (open source) looks promising and perhaps something to use and contribute to in my new job. https://github.com/datacontract/cli
Occasionally I post a single recommendation on my LinkedIn profile without creating a corresponding post on substack, so follow me on linkedin if you want to capture those as well.