Back to Blog

Data Engineering Applied to Growth (Growth Tech)

Articleβ€’14 min read
#data-engineering#growth#analytics#ab-testing#business-metrics#ltv

Data Engineering Applied to Growth (Growth Tech)

The most common failure mode of a data team is building excellent infrastructure that answers questions nobody is asking. The pipelines run clean, the dashboards look beautiful, the data warehouse is perfectly modeled β€” and the business still makes decisions based on gut feeling because nobody connected the data to the decisions that actually create revenue.

Growth Tech is the discipline that closes that gap. It is the intersection of data engineering, product analytics, and business economics β€” the set of practices that makes data a direct input to decisions about where to invest, what to build, and whether the investment is working.

This article covers the foundational concepts: funnel analysis, LTV and CAC, A/B testing infrastructure, and analytical pipeline architecture. By the end, you will have a framework for measuring whether your product is actually growing β€” and an engineering approach to getting the data that makes those measurements possible.

The growth accounting framework

Before building any infrastructure, you need clarity on what you are measuring. Growth accounting is the conceptual framework that makes business performance visible.

At its core, growth in any subscription or transactional business reduces to a single identity:

Revenue(t) = Revenue(t-1) + New Revenue - Churned Revenue + Expansion Revenue

This breaks down further:

Every metric you track should connect to one of these three components. If a metric you are tracking does not have a clear line to new, churned, or expansion revenue, ask whether it needs to exist.

Cohort thinking: the foundation of growth analysis

The most important shift from naive analytics to growth analytics is cohort-based thinking. Instead of asking "how many users do we have today?", cohort thinking asks "of the users who joined in March, how many are still here in June, and how much are they worth?"

A cohort is a group of users sharing a common characteristic, usually their signup date. Tracking cohort behavior over time reveals:

Without cohort analysis, you cannot distinguish between a growing business with bad retention (acquisition is masking churn) and a growing business with good retention (acquisition adds to a compounding base). These two businesses have radically different futures.

Funnel analysis: mapping where users convert and where they leave

A funnel is a sequence of steps a user must complete to reach a desired outcome. Every product has multiple funnels: the signup funnel, the onboarding funnel, the purchase funnel, the renewal funnel.

Funnel analysis answers: at each step, what fraction of users proceed vs. drop off? And why?

Building funnel analysis infrastructure

A well-instrumented funnel requires:

  1. Event tracking at every meaningful step β€” each user action that is part of the funnel emits an event: "signup_page_viewed", "email_entered", "email_verified", "profile_completed", "first_purchase_completed". Events need consistent naming conventions and must carry the user identifier (or anonymous session ID before signup) to link the steps.

  2. Event schema governance β€” event names, property names, and data types must be consistent across teams and time. A purchase_completed event emitted by the iOS app and the web app should have identical schemas. Drift here is a data quality problem that compounds.

  3. Session and identity resolution β€” users often interact anonymously before signing up. Your instrumentation must merge the pre-signup anonymous identifier with the post-signup user identifier, so you can attribute the full pre-conversion journey to each user.

  4. A funnel query layer β€” once events are in your warehouse, you need queries that can reconstruct funnels: "of all users who saw the checkout page in this date range, what fraction completed purchase within 24 hours?" Tools like dbt, Metabase, Amplitude, or Mixpanel make this queryable without raw SQL for each analysis.

What funnel analysis reveals

A typical checkout funnel might look like this:

| Step | Users | Conversion Rate | Drop-off Rate | |---|---|---|---| | Cart viewed | 10,000 | β€” | β€” | | Checkout started | 6,200 | 62% | 38% | | Payment info entered | 4,100 | 66% | 34% | | Purchase completed | 2,300 | 56% | 44% | | Overall conversion | β€” | 23% | β€” |

The largest drop-off in this funnel is between payment info entered and purchase completed β€” 44% of users who entered payment info did not complete the purchase. This is the highest-value optimization target. Is it a payment failure? A price sensitivity moment? An unclear call-to-action? The funnel tells you where to look; the qualitative investigation tells you why.

LTV and CAC: the economic engine of growth

Customer Lifetime Value (LTV) and Customer Acquisition Cost (CAC) are the two numbers that determine whether a business can grow sustainably.

LTV is the total revenue expected from a customer over their relationship with the product. Because customers leave over time, you weight expected future revenue by the probability of retention at each period.

A simplified LTV formula for a subscription business:

LTV = ARPU Γ— (1 / Monthly Churn Rate)

Where ARPU is Average Revenue Per User. If ARPU is $50/month and monthly churn is 5%, LTV = $50 Γ— (1/0.05) = $1,000.

This is a simplification β€” a proper LTV model accounts for cohort-level churn differences, expansion revenue, payment failures, and discount rates. But the intuition is important: LTV is primarily determined by churn. Reducing monthly churn from 5% to 3% increases LTV by 67%, while increasing ARPU by 20% increases LTV by only 20%.

CAC is the total cost of acquiring one new customer: all sales, marketing, and demand-generation costs divided by the number of new customers acquired in the same period.

CAC = (Sales Costs + Marketing Costs) / New Customers Acquired

The LTV:CAC ratio and payback period

The central health metric for any growth business is the LTV:CAC ratio:

The payback period is the time it takes for a customer's cumulative revenue to repay their CAC. A payback period of 12 months means you need a customer to stay for a year before they become profitable. In a high-churn business, many customers will leave before paying back.

Building LTV/CAC infrastructure

Accurate LTV and CAC tracking requires connecting three data sources that are often in different systems:

  1. Revenue data β€” from your payment processor (Stripe, Braintree), CRM, or billing system. Must include customer identifiers, payment dates, amounts, and refunds.
  2. Cost data β€” from your marketing platforms (Google Ads, Meta, LinkedIn), sales tools (CRM), and any other acquisition spending. Must be attributed by channel and time period.
  3. Product data β€” user events and engagement data that can serve as leading indicators of retention and churn.

The pipeline: extract these sources into your data warehouse, link them on customer identifier, and build models that compute LTV and CAC at the cohort, channel, and time-period level. This lets you answer: "Is the LTV:CAC ratio better for customers we acquire through paid search vs. organic search? Is it improving over time? Which months produced the most valuable cohorts?"

A/B Testing: the infrastructure of experimentation

A/B testing is the mechanism that transforms hypotheses into evidence. Without it, product decisions are based on opinion and best guesses. With it, they are based on causal evidence from your actual users.

But A/B testing is an engineering discipline, not just a product practice. Running experiments correctly at scale requires real infrastructure.

The core components of an experimentation platform

Assignment β€” randomly dividing users into control and treatment groups in a way that is consistent (a user who was in the control group yesterday should still be in the control group today for the same experiment), and balanced (groups should be statistically comparable before treatment).

Exposure logging β€” recording which users were assigned to which variant, with timestamps. Without complete exposure logs, you cannot compute the denominator of your metric calculations.

Metric computation β€” for each experiment, computing the metric value for each variant and the statistical test that determines whether the difference between variants is signal or noise.

Guardrail metrics β€” experiments optimizing for one metric can inadvertently degrade another. An experiment that increases click-through rate by adding a confusing CTA might decrease purchase completion. Guardrail metrics are automatically checked for every experiment and flag unexpected degradations.

Statistical validity β€” the most common mistake in A/B testing is peeking: checking results before the planned sample size is reached and stopping the experiment when results look good. This dramatically inflates the false positive rate. A proper experimentation platform uses sequential testing or Bayesian methods that are valid under continuous monitoring.

The minimum viable experimentation stack

For most teams getting started:

At larger scale (hundreds of experiments running simultaneously), purpose-built platforms β€” Statsig, Eppo, GrowthBook β€” provide the assignment consistency, guardrail monitoring, and statistical rigor that scale requires.

What to measure in an experiment

The most common mistake is measuring only the metric the experiment was designed to improve. A complete experiment analysis includes:

Analytical pipeline architecture for growth

All of the above requires a well-designed analytical pipeline. Here is the architecture that supports growth analytics at scale.

The modern data stack

| Layer | Technology | Role | |---|---|---| | Ingestion | Fivetran, Airbyte, custom CDC | Load source data into warehouse | | Event streaming | Segment, Rudderstack, custom Kafka | Collect and route product events | | Warehouse | BigQuery, Snowflake, Redshift | Central analytical datastore | | Transformation | dbt | Define business logic, build models | | Metrics layer | dbt Metrics, Cube | Consistent metric definitions | | BI / Exploration | Metabase, Looker, Superset | Self-serve dashboards and queries | | Experimentation | Statsig, Eppo, Growthbook | A/B test assignment and analysis |

The dbt-centric model layer

dbt (data build tool) is the standard for analytical transformations. It allows your data team to define business logic in SQL with software engineering practices: version control, testing, documentation, and modular composition.

A growth analytics dbt project typically has these model layers:

sources/          β†’ raw tables from source systems (events, payments, CRM)
staging/          β†’ cleaned and type-cast source tables (one-to-one with sources)
intermediate/     β†’ reusable business logic (session attribution, identity resolution)
marts/
  β”œβ”€β”€ growth/     β†’ funnel models, cohort retention, LTV, CAC
  β”œβ”€β”€ product/    β†’ feature usage, engagement, DAU/WAU/MAU
  └── finance/    β†’ revenue, MRR, ARR, churn

The key benefit of a metrics layer (dbt Metrics or Cube): the definition of "Monthly Recurring Revenue" or "7-day retention" is written once, in version-controlled code, and is consistent across every dashboard and report that references it. This eliminates the most common data credibility problem: two dashboards showing different values for the same metric because they compute it differently.

The importance of data quality for growth decisions

Growth decisions made on incorrect data are worse than decisions made with no data β€” because they create false confidence. A team that thinks their funnel conversion is 25% when it is actually 17% will underinvest in conversion optimization.

Data quality for analytical pipelines requires:

Connecting data to financial outcomes: a practical framework

The final step is the one most data teams skip: making the business impact of engineering and product work visible in financial terms.

The North Star Metric

A North Star Metric is a single metric that best captures the core value your product delivers to users. It is the leading indicator of long-term revenue health. For different product types:

Choosing the right North Star Metric is a strategic decision, not a data decision. But instrumenting it, tracking it by cohort, and connecting it to revenue outcomes is an engineering problem.

The weekly growth review

The growth review is the ritual that connects data to decisions. A well-structured weekly growth review answers:

  1. What happened to our key metrics this week vs. last week and vs. the same week last year?
  2. Which experiments completed this week and what did they show?
  3. What are the top conversion or retention issues flagged by data this week?
  4. What does the LTV/CAC look like for the most recent cohorts?

The data team's job is to make answering these questions fast, reliable, and self-service. If the growth review requires a data engineer to run custom queries each week, the pipeline is not finished yet.

Closing thought

Data engineering applied to growth is not about the sophistication of your tech stack. It is about closing the loop between user behavior (what people do), business outcomes (what that behavior is worth), and product decisions (what we should build next).

The teams that do this well are the ones that treat their analytical infrastructure with the same rigor they apply to their production systems: clear ownership, data quality testing, defined contracts, and measurable SLOs.

The competitive advantage of data is not in having it β€” almost everyone has it. It is in having it clean, timely, and connected to the decisions that actually move the business. That is the engineering problem worth solving.