For the past decade, the data engineering world has been on a remarkable journey of maturation. We've diligently absorbed the wisdom of our software engineering counterparts, adopting crucial practices like version control, CI/CD, Infrastructure as Code (IaC), robust observability, and comprehensive testing methodologies. This evolution has fundamentally reshaped how we build and operate data systems, fostering greater reliability, scalability, and maintainability. But the journey isn't over. In fact, we stand at the cusp of a new era: the era of true data platform engineering.
We've undeniably built data platforms, intricate ecosystems of tools and technologies. However, too often, these platforms remain complex labyrinths, demanding specialized expertise and manual intervention for even the most routine tasks. What's needed now is a paradigm shift – a move beyond simply engineering a data platform to building a genuine internal developer platform (IDP) that empowers our users, accelerates innovation, and unlocks the full potential of our data. This isn't just about implementing new tools; it's about a fundamental change in mindset, a recognition that our "customers" are our internal data product teams.
The journey isn't over. In fact, we stand at the cusp of a new era: the era of true data platform engineering.
From Orchestrators to Enablers: The Rise of the Data Platform Engineer
The analogy to software engineering's move from monolithic applications to microservices and developer platforms is apt. We're witnessing a similar transformation in the data realm. As I argued in "The data engineer is dead, long live the data platform engineer," the traditional data engineer role is evolving. We're moving away from being solely pipeline builders and data wranglers to becoming architects of self-service data experiences. Our focus shifts from the intricate details of individual transformations to building the golden paths that guide our users towards best practices, minimizing cognitive load, and enabling independent creation.
This transformation is about creating leverage. Instead of being the bottleneck for every data request, we empower our users to self-serve, accelerating the entire data value chain. It's about building a platform so intuitive and well-designed that "citizen data engineers" can confidently navigate and build upon it.
Our focus shifts from the intricate details of individual transformations to building the golden paths that guide our users towards best practices, minimizing cognitive load, and enabling independent creation.
Unlocking Velocity and Reducing Friction: The Mathem Experience
At my previous employer Mathem, we recognized this need for evolution and consciously shifted our data engineering team to become a platform engineering team for our analytical systems. This wasn't a superficial rebranding; it was a fundamental restructuring of our priorities and workflows. As detailed in "From pipelines to platform," we focused on building an IDP that provides self-service capabilities for the most common data product workflows.
Consider the crucial task of integrating new data sources. Previously, this could be a time-consuming, manual process. Now, through the implementation of data contracts and the development of dedicated CLIs and IaC modules, we dramatically simplified this process. By making it easier to onboard data, more teams can leverage it, leading to further innovation and platform improvements. This was directly reflected in our metrics – developer adoption of our self-service tools significantly increased, and the time to onboard new data streams drastically reduced.
Similarly, the creation of data models in our bronze and silver layers in BigQuery was streamlined through automation leveraging data contracts and CI/CD. As we explored in "Datahem odyssey, the evolution of a data platform" this move towards modularity and reusability was key. Instead of individual teams reinventing the wheel, they could leverage these standardized modules, ensuring consistency and reducing development time. This not only accelerated the creation of new data models but also improved their quality and adherence to best practices.
Furthermore, we focused on empowering our users to easily activate the data they modeled. By providing DBT macros and robust event-driven export capabilities, we lowered the barrier to leveraging data in operational systems and downstream applications. This significantly decreased the time it takes to move from data insight to actionable outcome, a key metric for any data-driven organization.
Instead of being the bottleneck for every data request, we empowered our users to self-serve, accelerating the entire data value chain.
The Metrics Speak Volumes: Developer Adoption and Workflow Velocity
The success of this shift isn't just anecdotal. The metrics we tracked painted a clear picture of the positive impact of embracing true platform engineering. We saw a marked increase in the adoption of our self-service tools and a significant reduction in the time it takes for teams to complete critical data workflows.
Increased Developer Adoption: By providing intuitive and well-documented tools and golden paths, we empowered more individuals within the organization to contribute to the data ecosystem. Teams became more self-sufficient, reducing their reliance on the core platform team for routine tasks.
Accelerated Workflow Completion: The automation of key processes, such as data on-boarding and model creation, dramatically reduced the time required to bring new data products to life. This increased velocity translated directly to faster time-to-value and a more agile data organization.
These metrics are the tangible proof that investing in building a robust IDP is not just a technical exercise; it's a strategic move that drives business value.
The metrics we tracked painted a clear picture of the positive impact of embracing true platform engineering.
The Industry is Taking Notice: Gartner and the Rise of Platform Engineering
This isn't just a trend within Mathem. The broader industry is recognizing the critical role of platform engineering. Gartner's inclusion of platform engineering in their hype cycle for 2023 and 2024 underscores its growing importance. Their prediction that "by 2026, about 80% of software engineering organizations will establish platform teams" highlights the momentum behind this movement.
Given that the data space often trails operational software engineering by a few years, we can anticipate a significant surge in "data platform engineering" initiatives around 2025. As Gartner aptly states, "Well-designed platforms can offer customers and business partners a seamless self-service experience, enabling users to perform valuable work with minimal overhead." The benefits they highlight – reduced cognitive load, enhanced developer experience, increased productivity, and greater independence – are precisely the outcomes we witnessed at Mathem.
Well-designed platforms can offer customers and business partners a seamless self-service experience, enabling users to perform valuable work with minimal overhead.
Embracing the Data Platform Renaissance: A Call to Action
The time for passive data platform building is over. It's time for data engineering to fully embrace the principles of platform engineering, to build internal developer platforms that empower our users, accelerate innovation, and unlock the true potential of our data. This is the data platform renaissance – a shift from managing infrastructure to enabling creation.
By focusing on providing well-defined golden paths, intuitive self-service tools, and robust automation, we can transform our data organizations into engines of innovation. The experiences and lessons learned at Mathem, as detailed in my previous posts, offer a blueprint for this transformation.
The future of data engineering isn't just about building better pipelines; it's about building platforms that enable everyone to build great things with data. Let's move beyond the era of complex, specialist-driven data systems and embrace the power of true data platform engineering. What are your thoughts? How are you approaching this evolution within your organization? Let's connect and share our journeys.
I'm committed to keeping this content free and accessible for everyone interested in data platform engineering. If you find this post valuable, the most helpful way you can support this effort is by giving it a like, leaving a comment, or sharing it with others via a restack or recommendation. It truly helps spread the word and encourage future content!