Great article! How did you handle data contracts for third-party data sources like Salesforce, Stripe, Google Analytics etc? Also, was there a single instance of a Dataflow job handling all sources, or were separate Dataflow jobs initiated based on source type, source, target, etc.?
Thanks. If there was a technical system owner of a third party system, ex Salesforce, then that should fall on that team as a data producer. But if there was none then it usually was the central data platform team who took care of it. We had a plan to extend the cli to generate a contract template from the third party system api to streamline that process.
We had one job but considered multiple ones to cater for different priorities and domains and improve reliability in case one data producer would impact others by sudden spikes or similar. In fact we had two jobs, one for live processing and one for backfills to make sure a backfill never should introduce latency in the live feed.
Great article! How did you handle data contracts for third-party data sources like Salesforce, Stripe, Google Analytics etc? Also, was there a single instance of a Dataflow job handling all sources, or were separate Dataflow jobs initiated based on source type, source, target, etc.?
Thanks. If there was a technical system owner of a third party system, ex Salesforce, then that should fall on that team as a data producer. But if there was none then it usually was the central data platform team who took care of it. We had a plan to extend the cli to generate a contract template from the third party system api to streamline that process.
We had one job but considered multiple ones to cater for different priorities and domains and improve reliability in case one data producer would impact others by sudden spikes or similar. In fact we had two jobs, one for live processing and one for backfills to make sure a backfill never should introduce latency in the live feed.