When Should You Use the Data Platform for Operational Processes
And When Should You Avoid It?
A question I often get from readers or attendees at talks is: "When is it appropriate to use the data platform for operational processes, and when is it better to build direct connections between operational systems?" It’s a fair question since I often talk about data flywheels and operationalizing analytical data. With modern data platforms offering incredible capabilities, the line between operational and analytical workflows has become increasingly blurry. The possibilities seem endless—but just because you can do something doesn’t mean you should.
In this post, I’ll share some guidelines I’ve found helpful to decide when to operationalizing data from your data platform and when to rely on direct connections in your operational systems. I’ll also touch on the practical and ethical considerations that often come into play.
When to Use the Data Platform for Operational Processes
There are specific situations where leveraging the data platform for operational processes isn’t just viable—it’s the preferred approach. Here’s when it usually makes sense:
1. When the Use Case Requires Historical or Immutable Data
Operational systems are typically designed for real-time, transactional needs and often lack historical and immutable records. If your use case depends on tracking changes over time or analyzing historical trends, your data platform is the right place.
2. When Data Needs Complex Transformations or Enrichments
If the operational process requires data that involves extensive transformations, aggregations, or enrichment from multiple sources, the data platform is often the best choice since these tasks are resource-intensive and typically fall outside the scope of operational systems. Instead the data platform allows you to centralize these operations and deliver pre-processed, clean data to operational systems.
3. When Cross-Domain Data is Needed
Operational systems often operate in silos, while data platforms are designed to integrate data from across the organization. If your process spans multiple domains—like combining sales, marketing, and support data—the data platform provides a unified view.
4. When Governance, Quality, or Compliance is Critical
Operationalizing data from a governed data platform ensures you’re working with high-quality, consistent data. If your use case involves sensitive or regulated data, such as personal customer information, the data platform is often preferred because it provides:
Centralized governance: Ensures compliance with regulations like GDPR by managing access and enforcing data protection policies.
Data lineage and auditing: Tracks how sensitive data is used, ensuring traceability for auditing or regulatory purposes.
Controlled access: Offers role-based permissions and masking to ensure only authorized users and systems can access sensitive data.
5. When Reusability is Important
If multiple operational systems or use cases need the same data, the data platform offers a single source of truth.
6. When Access Patterns Involve Multi-Row Queries with Low Concurrency
The data platform is well-suited for scenarios where you need to query many rows of data, but:
Concurrency is low: Only a few users or systems need access at a time.
Latency tolerance: Response times measured in seconds or even minutes are acceptable.
Data freshness requirements are flexible: The use case doesn’t demand real-time updates but requires data to be reasonably up-to-date. A delay of a few minutes, such as with near real-time processing, is typically acceptable.
"The data platform excels when dealing with complex, multi-domain queries, large-scale data processing, and scenarios where governance and compliance are non-negotiable."
When to Build Direct Connections Instead
Despite the power of the data platform, there are cases where direct connections to operational systems are better. With direct connections, I refer to scenarios where the operational system itself handles the necessary analytical capabilities or directly connects to other systems to fetch the required data, bypassing the central data platform. Here’s usually when:
1. When Real-Time, Low-Latency Data is Critical
If your operational process relies on real-time data with minimal latency, direct connections are usually necessary. Data platforms may introduce latency due to additional processing, making direct connections preferable for these use cases.
2. When Data Needs Are Simple
If the data required by the operational system is already available and doesn’t require transformations or enrichment, building a direct connection is often faster and simpler.
3. When Privacy and Compliance Concerns Arise
In some cases, direct connections are better for sensitive data to avoid exposing more information than necessary. If using the data platform could result in over-sharing sensitive information or if the operational system only needs a subset of sensitive data:
Direct connections allow minimal data exposure: Fetch only the exact data required without exposing broader datasets.
Tightly controlled integration: Ensures sensitive data flows directly between systems without involving a third-party platform or additional processing layers.
4. When Operational Systems Are Complex
If the operational system has unique data requirements that tightly couple it with its source systems, it’s often better to build direct connections rather than rely on the data platform.
5. When Access Patterns Involve Single, Simple Lookups
Direct connections excel in cases where the operational process requires frequent, single-row lookups with minimal latency. Operational systems are better suited for these predictable, fast queries.
"Direct connections shine in real-time, low-latency scenarios where simplicity and speed are essential, and tightly controlled data integration minimizes exposure."
Balancing Practicality with Ethical Considerations
A common frustration is the need to "do double work" by replicating efforts in operational systems when the data platform already has everything. However, just because the data exists in the platform doesn’t mean it should be used for every operational need. Always ask:
What is the purpose of this data access? Is it necessary for the operational process, or is it being used out of convenience?
Does this introduce risks? Could using the data platform for this operational process compromise privacy, compliance, or governance?
A Simple Decision Framework
To help you make these decisions, ask yourself the following questions:
Does the use case require historical, enriched, or governed data?
Does the operational process require real-time, low-latency data?
Does using the data platform introduce privacy, compliance, or performance risks?
Are the access patterns complex, involve querying multiple rows, or require large-scale data processing?
Closing Thoughts
Operationalizing analytical data is a powerful and great way to unlocking business value, but when should you leverage your data platform for that? The boundaries between operational and analytical processes will continue to blur as data platforms and operational systems grow in capability. The key is not to ask, "Can we do this?" but rather, "Should we?" By considering factors like performance, scalability, access patterns, and use case requirements, you can make informed choices that align with your organization’s needs.
That said, it’s important to remember that this is a general guideline, not a rigid rulebook. There are always exceptions—unique use cases or technologies that don’t fit neatly into one approach or the other. Furthermore, the capabilities of modern platforms and systems are evolving rapidly, continuously challenging and redefining the boundaries of when to use one approach over the other. Staying adaptable and reassessing decisions as technology advances is just as critical as making the right choice today.
The key is not to ask, "Can we do this?" but rather, "Should we?"
Have more questions about this topic? Share your thoughts, and I’ll include them in future posts. Let’s keep this conversation going!
I'm committed to keeping this content free and accessible for everyone interested in data platform engineering. If you find this post valuable, the most helpful way you can support this effort is by giving it a like, leaving a comment, or sharing it with others via a restack or recommendation. It truly helps spread the word and encourage future content!
Another right-to-the-point article 🔥. I know some clients that might argue though. Keep up the good work!