Downstream Lineage in Domo: Why It Matters and How We Made It Programmable

After nine months of running Domo against Snowflake, our Snowflake engineers came to us with a problem: Domo is very expensive. Multiple queries were pulling from the same fact tables, over and over. Redundant dataflows, redundant datasets, nobody could tell what was actually in use and what could be consolidated.

"Can't you consolidate or get rid of something?"

The challenge with consolidation is two-fold. If teams haven't been strident about consistent naming conventions in dataflows or during the ingest stage, there could be cascading impacts. We've all asked the question: if I rename this column in Magic, how many cards will I have to fix? How many broken Beast Modes will there be?

That's the blast radius problem. And the answer to it is downstream lineage.

The Problem: What Depends on This?

Upstream lineage — "where does this data come from?" — Domo has always supported that. You can trace a card back to its dataset, a dataset back to its dataflow, a dataflow back to its source. That's the direction most people think about lineage.

But downstream lineage is the other direction. It's "what depends on this?" If I change this dataset, which cards break? Which dashboards are affected? Which published pages go stale? That's the question that actually keeps you up at night — and until now, nobody had wired it up programmatically in the Domo ecosystem.

This isn't just a technical question. When you're running Domo Everywhere and publishing content across instances, the blast radius extends beyond your own Domo instance. A dataset change on the publisher side cascades through publications into subscriber instances you might not even know about. If you're running the multi-instance model I wrote about in Stop QA-ing in Prod, you need to know what your published content feeds — not just in your instance, but in every instance that subscribes to it.

And then there's AI Readiness. Domo's AI Readiness is their implementation of what should be a data dictionary — describe what this column means and how it should be used. Maintaining that is cumbersome at minimum, and prone to stagnation the moment someone changes the ETL. Ideally, you want upstream lineage detection to take a naive pass at generating your dictionaries, or at minimum use the naive lineage to validate that your hand-written dictionaries are still accurate. "Where did this data come from?"

In the context of Domo Everywhere, Domo has implemented APIs to make it possible to trace the subscriber asset. But swimming in the opposite direction is hard. If I'm in the subscriber instance, "what data was consumed to give me this number?" Data lineage is no longer a technical exercise — it's a trust issue. Data trust predates technical solutions like semantic layers, Beast Mode Manager, AI Readiness, data dictionaries, and lineage graphs. It's a question of confidence. A question of getting the data into the hands of the right people at the right time.

How We Solved It

Our API is just the technical exercise of recreating lineage paths so that we can answer questions like "where does this data come from" or "what is the refresh cadence of our dataset" — which is a more distilled answer than "why is this number not what I expected."

We added get_downstream() and get_impact() to the crew-dcs library — the open-source Domo Python SDK that I maintain. Here's how it works, which APIs we call, and what the code looks like.

Tracing Upstream: Page → Dataset

Let's say you're looking at a page and want to know where its data comes from. Here's the lineage path:

graph LR
    Page --> Card
    Card --> Dataset
    Dataset --> Dataflow
    Dataflow --> Source_Dataset[Source Dataset]

Hop	From	To	Domo API	crew-dcs Method
1	Page	Card	`/api/content/v1/pages/{id}/cards`	`page.Cards.get()`
2	Card	Dataset	`/api/data/v3/datasources/{id}`	`card.DataSet.get()`
3	Dataset	Dataflow	Datacenter Lineage API (`traverseUp=true`)	`dataset.Lineage.get()`
4	Dataflow	Source Dataset	Datacenter Lineage API (`traverseUp=true`)	`dataflow.Lineage.get()`

Each hop is a separate API call. The Datacenter Lineage API is Domo's central lineage graph — it stores relationships between datasets, dataflows, and cards as a directed graph. When you call get() with traverse_up=True, it walks the graph upward and returns the full chain.

Tracing Downstream: Dataset → Card

Now the other direction. You're about to change a dataset and want to know the blast radius:

graph LR
    Dataset --> Dataflow
    Dataset --> Card
    Card --> Page
    Page --> Publication

Hop	From	To	Domo API	crew-dcs Method
1	Dataset	Dataflow	Datacenter Lineage API (`traverseDown=true`)	`dataset.Lineage.get_downstream()`
2	Dataset	Card	Datacenter Lineage API (`traverseDown=true`)	`dataset.Lineage.get_downstream()`
3	Card	Page	Card Metadata API (`/api/content/v1/cards/{id}/pages`)	`card.Lineage.get_downstream()`
4	Page	Publication	Publication API (`/api/content/v1/publications`)	`page.Lineage.get_downstream()`

The Datacenter Lineage API supports traverseDown=true — it was always there. The problem was that nobody had wired it up in a client library. Cards and pages are special cases: cards don't appear in the datacenter lineage graph as downstream dependents (they're tracked through the card metadata API instead), and pages need to check whether they're part of a publication.

Here's what the code looks like:

# What would break if I change this dataset?
dataset = await DomoDataset.get_by_id(auth=auth, dataset_id="ds-123")
impact = await dataset.Lineage.get_impact()
 
# Filter to just cards and pages
impact = await dataset.Lineage.get_impact(entity_types=["CARD", "PAGE"])

Two lines to get the full downstream impact. The get_impact() method walks the entire downstream graph and returns every entity that depends on your dataset — cards, pages, dataflows, publications. It deduplicates, it handles the special cases for cards and pages, and it excludes the parent entity (you don't want the dataset itself in its own blast radius).

For federated content — Domo Everywhere — the traversal crosses instance boundaries:

# What consumes this published content on subscriber instances?
downstream = await dataset.Lineage.get_downstream(
    parent_auth=publisher_auth,
    parent_auth_retrieval_fn=get_auth_for_instance
)

This walks the publisher → publication → subscription → subscriber chain. So when someone asks "if I change this column in Magic, how many cards will I have to fix?" — you can answer. Not just for your instance, but across every instance that subscribes to your content.

This took me about three days. Not the API calls — those were straightforward. The three days was spent on the edge cases: cards having two separate code paths that needed to be unified, pages needing publication awareness, federated content crossing instance boundaries, and making sure the interface was clean enough that you'd actually want to use it.

The Technical Exercise Is Not the End of the Problem

Being able to trace lineage is table stakes — although surprisingly cumbersome in Domo. But then what? You've got the lineage graph. You've got the blast radius. You've got the AI Readiness scores. Now what?

The real value isn't in the API calls or the code. It's in surfacing the right data at the right time. That's where an experienced consultant like DataCrew can help deliver value.

Think about it this way. Your Snowflake team says Domo is expensive. You need to consolidate. You can now trace what depends on what — but you still need someone to make the judgment calls. Which dataflows are redundant? Which datasets can be merged without breaking trust? Which cards can be retired because nobody has looked at them in six months? The lineage tells you what's connected. It doesn't tell you what matters.

Data trust is about confidence. It's about getting the data into the hands of the right people at the right time. The lineage API is how you build the map. But the map is not the territory. You need someone who knows the territory — who understands which datasets feed the CEO's dashboard, which Beast Modes are actually business logic versus visual formatting, which publications are live versus stale.

That's the work. The API is just the starting point.

Want Help With This?

If you're staring at a Domo instance with hundreds of datasets and no idea what depends on what, you're not alone. That's the default state of most Domo environments I walk into.

I work with teams to implement lineage-aware automation — from impact analysis scripts to full pipeline governance workflows. And I upskill your team so they can maintain and extend the system themselves. Because the best automation is the one your team actually understands.

Get in touch →

The lineage tooling in crew-dcs builds on the publish lineage work I wrote about in Trace Domo Publish Lineage. If you haven't read that one, it covers the upstream side — how to map what's production before you delete it.

Downstream Lineage in Domo: Why It Matters and How We Made It Programmable

The Problem: What Depends on This?

How We Solved It

Tracing Upstream: Page → Dataset

Tracing Downstream: Dataset → Card

The Technical Exercise Is Not the End of the Problem

Want Help With This?

Related

Trace Domo Publish Lineage: Know What's Production Before You Delete It

Building Apps with AI: Best Practices w/ Jon Tiritilli

Build chatbots in Domo AI (Part 2) - Setting up an Endpoint (FAIL)

Window Functions in Domo