In today's data-centric organizations, the central data engineering team often finds itself at a crossroads. Traditionally focused on building and operating data pipelines, these teams are increasingly overwhelmed, becoming bottlenecks as demand for data products explodes. However, this challenge presents a significant opportunity: to evolve from a reactive data engineering function to a proactive data platform engineering team. The goal? To transform from a constraint into the ultimate enabler, fostering self-service, scalability, and a thriving data ecosystem that empowers data product developers across the organization.
The key to this transformation lies in strategically adopting Platform Engineering principles, and at the heart of this approach are Golden Paths. For aspiring data platform engineering teams, golden paths are not just about optimizing workflows; they are about fundamentally shifting the operating model. They provide well-defined, supported, and platform-orchestrated routes that empower data product developers to build and deploy data products rapidly, reliably, and autonomously. This shift is akin to how operational platform teams have empowered application developers through self-service infrastructure – now, it's data's turn.
In this post, I will focus on how Golden Paths are the catalyst for building a scalable, distributed, and truly self-service data product development model. In upcoming posts, I will dive deeper into the crucial roles of Internal Developer Platforms and Platform Orchestrators in enabling this transformation.
1. WHAT are Golden Paths?
Think of Golden Paths as the well-paved roads on your data platform. They are predefined, opinionated, and fully supported workflows designed by the data platform engineering team to guide data product developers through common tasks. By offering these clear, streamlined routes, golden paths reduce complexity, ensure adherence to best practices, and ultimately lead to faster and more reliable data product development – directly accelerating your data flywheel. They minimize cognitive load, allowing data product developers to focus on building valuable data products instead of wrestling with infrastructure complexities.
2. HOW to build a Golden Path - artifacts
To understand Golden Paths beyond the "well-paved road" metaphor, let's get concrete. A Golden Path is a curated and opinionated bundle of artifacts and resources that, when combined, provide a streamlined and automated workflow for a specific data product development task.
Here's a breakdown of the typical artifacts associated with a Golden Path in a data platform context:
Workload configurations
Define the desired state of a data workload or component in a declarative and repeatable way. This moves away from manual, ad-hoc configurations.
YAML or JSON manifests: For defining data pipelines (e.g., Spark jobs, Flink applications), data models (e.g., dbt models), data exports, or infrastructure components required for the workflow.
Configuration-as-Code repositories: Version-controlled repositories storing these configuration files, enabling auditability and collaboration.
Parameterizable templates: Reusable configuration templates that can be customized with specific parameters (e.g., data source names, target destinations, resource allocations) for different use cases.
Code templates & scaffolding
Provide starting points and best-practice examples for developers to build upon, reducing boilerplate code and promoting consistency.
Starter code repositories: Pre-configured repositories with skeleton code, dependency management, and basic project structure for common data product types (e.g., a starter dbt project, a template for a data API, a basic machine learning pipeline structure).
Code snippets and libraries: Reusable code blocks or libraries that encapsulate common data engineering patterns and best practices (e.g., standardized data quality checks, logging utilities, connection handling functions).
CLI tools or SDKs: Command-line interfaces or Software Development Kits that provide commands and functions to interact with the golden path and platform programmatically.
Automated pipelines
Automate the steps involved in building, testing, deploying, and managing data workloads defined by the golden path.
CI/CD pipeline definitions: Configurations for CI/CD systems (e.g., Jenkins, GitLab CI, GitHub Actions) that automate testing, building, and deployment of data products.
Data workflow orchestration definitions: Configurations for data workflow orchestration tools (e.g., Airflow, Dagster, Prefect) that manage the execution and dependencies of data pipelines defined by the golden path.
Automated testing suites: Pre-built test suites (data quality tests, unit tests, integration tests) that are automatically executed as part of the golden path workflow.
Documentation
Guide developers on how to use the golden path effectively, understand its components, and troubleshoot common issues. Documentation is crucial for self-service adoption.
Getting started guides: Step-by-step tutorials explaining how to use the golden path for a specific task.
Reference documentation: Detailed documentation of each component of the golden path, configuration options, and best practices.
FAQ and troubleshooting guides: Addressing common questions and providing solutions to typical issues developers might encounter.
Architecture diagrams: Visual representations of the golden path workflow and its components.
Tooling & platform integrations:
Leverage existing platform tools and services to provide a seamless and integrated developer experience within the golden path.
Platform APIs and CLIs: Integrations with the data platform's APIs and command-line tools to enable programmatic interaction with the golden path.
Self-service UI elements: Optional user interface components (portals, dashboards) that provide a visual way to interact with and monitor golden path workflows.
Monitoring and logging dashboards: Pre-configured dashboards that visualize the performance and health of workloads deployed through the golden path, integrated with the platform's monitoring and logging systems.
Security and governance guardrails: Policies and controls embedded within the golden path to ensure security and compliance are automatically enforced.
By providing these concrete components, platform teams can move beyond abstract concepts and deliver real value to their data teams, enabling them to build data products faster and more reliably.
3. WHY - The data platform engineering opportunity
The opportunity for central teams is clear: evolve from directly doing all the data engineering to building a platform that empowers others. Golden paths are the mechanism for this shift, implementing them creates a scalable and efficient model, benefiting both the platform team and the data product developers.
What are data products?
Before we delve deeper into the benefits of data platform team and data product teams, let's briefly define data products. In essence, a data product is anything your teams build that delivers value using data. This can range from:
Analytical dashboards and reports: Providing insights into business performance and trends.
Machine learning models: Automating decisions, predictions, and personalization.
Data APIs: Enabling applications to access and utilize curated data programmatically.
Data pipelines: Transforming and moving data to make it usable.
Curated datasets: Cleaned, transformed, and enriched data ready for self-service consumption.
Golden paths are designed to streamline the creation and deployment of these diverse data products, empowering data product developers to build them themselves using platform-provided capabilities.
Benefits for data product developers
Golden Paths true power shines in the hands of data product developers. By providing these well-defined and self-service routes, we unlock significant advantages directly for those building data products every day. Let's explore the key ways Golden Paths empower data product developers and transform their experience:
Accelerated innovation and time-to-value: Self-service golden paths dramatically accelerate development cycles, empowering developers to bring data products to life faster and respond quickly to business needs.
Reduced cognitive burden: focus on data value: Developers are freed from the complexities of underlying infrastructure and data plumbing. They can concentrate on the core logic, business value, and unique features of their data products.
Increased autonomy and self-sufficiency: Developers gain autonomy and control over their data product development life-cycle, reducing dependencies on central teams and fostering faster iteration and innovation.
Consistent and reliable development experience: Golden paths provide a consistent and reliable development experience, ensuring predictable outcomes and reducing the learning curve for new data technologies and best practices.
Benefits for data platform engineering teams
So, we've explored how Golden Paths empower data product developers, but the advantages extend powerfully to the data platform engineering teams themselves. By strategically building and offering these self-service pathways, platform teams can fundamentally reshape their role and unlock a new level of effectiveness. Let's delve into the key benefits that Golden Paths deliver directly to data platform engineering teams:
Shift to strategic platform building: The team transitions from being a reactive service provider to a proactive platform builder and operator. Focus shifts to strategic platform evolution, new feature development, and improving the overall developer experience.
Scalable support model: Self-service automation through golden paths enables the platform team to support a growing number of data product developers without linearly increasing the platform team size. Scalability is achieved through platform capabilities, not manual effort.
Measurable platform adoption and value: Platform adoption becomes directly measurable through golden path usage metrics. The platform team can track time-to-value for data products built using the platform and demonstrate the ROI of platform investments.
Centralized governance and standardization: The platform enforces consistent governance and standardization through orchestrated golden paths, ensuring compliance, security, and data quality across all data products.
4. Golden Paths as platform services - examples
Let's revisit examples, now focusing on how golden paths are delivered as platform-orchestrated self-service capabilities, but first a brief clarification of “orchestration”.
Clarifying the orchestration confusion
It's important to distinguish platform orchestration from data orchestration.
Data orchestration, often associated with tools like Airflow, Dagster, and Prefect, focuses on managing the flow of data within and across data environments. As commonly defined, "The orchestration layer refers to a layer of the modern data platform that empowers data teams to more easily manage the flow of data within and across data environments”. These tools are crucial for automating data pipelines.
Platform orchestration, in the context of Platform Engineering and this blog post, is a broader concept. It's about orchestrating the entire platform – infrastructure, services, tools, and workflows – to enable self-service capabilities for data product developers. The Platform Orchestrator is the engine of the Internal Developer Platform (IDP), automating the provisioning of resources, deployment of services, enforcement of policies, and the execution of golden paths. While data orchestration tools might be components within a data platform and potentially orchestrated by the Platform Orchestrator, platform orchestration itself operates at a higher level, focusing on the overall developer experience and platform management.
Example 1: Self-service data on-boarding
The data platform engineering team provides a fully orchestrated data on-boarding golden path as a platform service.
Platform Orchestrator for data intake: Data producers interact with the platform via CLI or API, defining data source configurations in YAML or JSON. The Platform Orchestrator processes these configurations to initiate onboarding.
Platform-driven schema management: The Platform Orchestrator triggers automated schema inference and registration pipelines, using platform-managed schema registry services.
Platform-enforced data quality validation: The Platform Orchestrator invokes a centralized data quality engine (a platform service) to validate data against predefined rules.
Platform-automated data integration: The Platform Orchestrator manages data integration pipelines (platform services) to land data in curated layers.
Platform-integrated data catalog: The Platform Orchestrator ensures automatic metadata registration in the platform's data catalog.
Data platform engineering team focus: Building and operating the Platform Orchestrator, platform services (schema registry, data quality engine, data catalog, integration pipelines), and defining the declarative configurations and workflows for the golden path. Monitoring platform health and golden path adoption.
Example 2: Self-service data modeling
The data platform engineering team delivers a streamlined, platform-orchestrated data modeling golden path.
Platform-provisioned development environments: Data product developers request development environments via the platform API or CLI. The Platform Orchestrator dynamically provisions pre-configured environments (containers, cloud IDEs) with necessary tools and libraries.
Platform-managed version control integration: The Platform Orchestrator integrates with platform-provided Git services or enforces organizational Git usage.
Platform-orchestrated automated testing: The Platform Orchestrator triggers platform-provided testing frameworks to execute data model tests as part of the golden path workflow.
Platform-automated data model CI/CD: The Platform Orchestrator manages CI/CD pipelines (platform services) for building, testing, and deploying data models to different environments.
Platform-integrated data lineage: The Platform Orchestrator leverages platform-level data lineage services to automatically track lineage for data models.
Data platform engineering team focus: Building and operating the Platform Orchestrator, development environment provisioning services, CI/CD pipelines, testing frameworks, and data lineage services. Defining the golden path and ensuring seamless orchestration.
Example 3: Self-service data activation
Data platform engineering empowers self-service data activation through a platform-orchestrated golden path.
Platform Orchestrator for data export configuration: Data consumers use a CLI or API to define data export configurations. The Platform Orchestrator processes these configurations to initiate data activation workflows.
Platform-enforced security and data masking: The Platform Orchestrator enforces platform-level security policies and data masking rules during data activation.
Platform-integrated auditing and monitoring: The Platform Orchestrator leverages platform-wide auditing and monitoring services to track data activation events.
Platform-provided data destination connectors: The platform offers a catalog of pre-built connectors, orchestrated by the Platform Orchestrator, for seamless data delivery to various destinations.
Platform-driven usage tracking and cost management: The Platform Orchestrator integrates with platform-level usage tracking and cost management services to provide visibility and control over data activation costs.
Data platform engineering team focus: Building and operating the Platform Orchestrator, security services, monitoring services, connector library, and usage/cost management services. Designing the data activation golden path and ensuring seamless platform orchestration.
By focusing on building a robust Internal Developer Platform and Platform Orchestrator, data platform engineering teams can deliver powerful, self-service golden paths that truly empower data product developers and drive data innovation at scale."
5. Identifying and prioritizing Golden Path candidates
Not every workflow is a prime candidate for a golden path. Platform teams need a systematic approach to identify and prioritize the most impactful opportunities.
Identifying candidates
Creating effective Golden Paths isn't about building them for every possible workflow, it's about strategic focus. To maximize impact and ROI, data platform engineering teams need a clear process for identifying the right candidates for Golden Paths. Let's explore a few key criteria to help you pinpoint the most promising workflows to pave first:
High-frequency tasks: Workflows that are performed frequently by data product developers are strong candidates. Automating these will yield significant time savings and productivity gains.
Pain points and bottlenecks: Identify areas where data product developers experience friction, delays, or require frequent support from the platform team. These are prime opportunities to streamline processes with golden paths.
Repetitive and manual processes: Look for tasks that involve significant manual effort, configuration, or scripting. Automation through golden paths can eliminate toil and reduce errors.
Areas with high Impact on data product quality or reliability: Workflows critical to ensuring data quality, security, compliance, or performance are excellent candidates for standardization and automation.
Alignment with strategic goals: Prioritize golden paths that directly support key organizational objectives, such as accelerating data-driven innovation, improving data governance, or reducing time-to-market for data products.
Prioritizing candidates
Once you've identified promising Golden Path candidates, the next crucial step is prioritization. With limited resources and time, data platform engineering teams need a framework to decide which Golden Paths to build first to maximize their impact. Here are key factors to consider when prioritizing your Golden Path roadmap:
Impact vs. Effort: Evaluate the potential impact of each golden path candidate (e.g., time savings, quality improvement, risk reduction) against the effort required to build and maintain it. Focus on high-impact, feasible paths first.
Developer feedback and demand: Engage with data product developers to understand their biggest pain points and prioritize golden paths that address their most pressing needs. High demand indicates likely adoption.
Platform team capacity: Consider the platform team's current capacity and expertise. Start with golden paths that align with the team's skills and resources.
Dependencies and prerequisites: Assess dependencies between potential golden paths. Some paths may build upon others, requiring a phased roll-out.
Quick Wins vs. long-term value: Balance quick wins that demonstrate immediate value with longer-term, more strategic golden paths that will deliver sustained benefits over time.
By systematically identifying and prioritizing golden path candidates, platform teams can ensure they are focusing their efforts on the most impactful initiatives and maximizing their return on investment.
Not every workflow is a prime candidate for a golden path. Platform teams need a systematic approach to identify and prioritize the most impactful opportunities.
6. Key principles for data platform engineering teams building Golden Paths
For central data platform engineering teams embarking on the golden path journey, consider these key principles to ensure success and drive adoption:
Prioritize developer experience above all else
Golden paths are only effective if they are embraced by data product developers. Focus relentlessly on creating a seamless, intuitive, and productive developer experience. Solicit continuous feedback and iterate based on user needs.
Measure and monitor adoption metrics
Track key metrics such as golden path usage, time-to-value for data products built using golden paths, developer satisfaction, and reduction in support requests. Use these metrics to guide prioritization and demonstrate the value of golden paths.
Start small, iterate fast, expand incrementally
Don't attempt to build a comprehensive suite of golden paths upfront. Start with one or two high-impact workflows, iterate rapidly based on feedback and adoption data, and expand the golden path portfolio incrementally.
Treat Golden Paths as platform products
Apply product management principles to your golden paths. Define clear value propositions, stakeholders, roadmaps, release cycles, and support models. Market and promote your golden paths internally to drive adoption.
Foster a community around Golden Paths
Encourage collaboration and knowledge sharing among data product developers using golden paths. Create documentation, tutorials, and support channels to build a thriving community and facilitate peer-to-peer learning.
Continuously evolve and adapt
The data landscape is constantly evolving. Regularly review and update your golden paths to incorporate new technologies, best practices, and address emerging user needs. Embrace a culture of continuous improvement.
Summary
Golden paths, powered by a robust Internal Developer Platform and Platform Orchestrator, are the strategic imperative for central data platform engineering teams seeking to transform their role and unlock data innovation at scale. By shifting from a data engineering bottleneck to a platform engineering booster, these teams can empower data product developers across the organization to build and deploy data products with unprecedented speed, reliability, and autonomy.
For data platform leaders and engineers, the path forward is clear: Embrace platform engineering, build your Internal Developer Platform, orchestrate golden paths, and empower your data product developers. This is how you move beyond simply managing data to truly driving data-driven success.
In future posts, I will delve deeper into the architecture and implementation of Internal Developer Platforms and Platform Orchestrators – the essential engines behind successful golden paths.
Have you implemented golden paths for data platforms? What challenges and successes have you encountered? Please share your thoughts and contribute to the conversation in the comments below!
I'm committed to keeping this content free and accessible for everyone interested in data platform engineering. If you find this post valuable, the most helpful way you can support this effort is by giving it a like, leaving a comment, or sharing it with others via a restack or recommendation. It truly helps spread the word and encourage future content!