Google BigQuery data into Customer.io (the retention-operator way)

Customer.io partner logo

Table of Contents

Summarize this documentation using AI

This banner was added using fs-inject

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Overview

If you’re running retention in Customer.io, BigQuery is usually where the “real truth” lives: orders, returns, subscriptions, inventory, margins, and all the modeled fields your team actually trusts. The practical goal is simple—get that BigQuery truth into Customer.io in a way that keeps segments accurate and triggers firing consistently; if you want a second set of eyes on the mapping, book a strategy call.

In most retention programs, BigQuery becomes the source for high-leverage audiences like “high intent, no purchase,” “replenishment overdue,” or “VIP at churn risk.” The difference between a great program and a noisy one is almost always identity resolution and event hygiene—not message copy.

How It Works

BigQuery itself doesn’t “send campaigns”—it feeds Customer.io with people attributes and events that journeys can reliably trigger off. The retention win comes when the data arrives with the right identifiers, consistent timestamps, and a stable schema so segments don’t drift and automations don’t silently stop enrolling.

  • Two data shapes matter:
    • Person attributes (e.g., lifetime_value, last_order_date, subscription_status, predicted_next_order_date) to power segmentation and message personalization.
    • Events (e.g., order_completed, cart_abandoned, return_initiated, back_in_stock) to trigger journeys and measure conversion cleanly.
  • Identity resolution is the make-or-break: Customer.io needs a stable person identifier (commonly id) and/or a consistent primary key like email/phone. If BigQuery rows don’t map deterministically to a single profile, you’ll see duplicate people, broken suppression logic, and “why did they get this twice?” complaints.
  • Data mapping drives trigger reliability: If your BigQuery export calls something orderDate but your journey expects order_date, you don’t get a warning—you get a dead trigger. Same story with timestamps: wrong timezone or string formatting can push people into (or out of) “within last X days” segments.
  • Sync cadence controls freshness: Cart recovery and browse abandonment need near-real-time events; LTV, churn risk, and replenishment can tolerate hourly/daily updates. Mixing these without a plan is how you end up with “cart abandoned” firing 6 hours late.

Step-by-Step Setup

Before you wire anything up, decide what BigQuery is responsible for (source of truth) versus what your app/web tracking is responsible for (real-time behavioral events). That division prevents duplicate events and keeps your audiences explainable.

  1. Pick your canonical identifier.
    • Decide what Customer.io’s id will be (Shopify customer ID, internal user ID, etc.).
    • Ensure every BigQuery row you plan to sync includes that identifier consistently.
    • If you must rely on email, normalize it (lowercase, trimmed) and treat it as a key—don’t let multiple emails map to the same person without a merge strategy.
  2. Define your “people” table/view for Customer.io.
    • Create a BigQuery view that is one row per person.
    • Include only attributes you will actually segment on or personalize with (avoid dumping 200 columns “just in case”).
    • Standardize types: booleans as true/false, timestamps as timestamps, numbers as numbers.
  3. Define your “events” table/view for Customer.io.
    • Create a BigQuery view that is one row per event with: person identifier, event name, event timestamp, and event properties.
    • Use a stable event naming convention (e.g., order_completed not Order Completed).
    • Include an event-level unique key (e.g., event_id or order_id + type) to support deduping upstream if needed.
  4. Map fields to Customer.io’s expected schema.
    • Confirm which fields become person attributes vs event properties.
    • Lock the names—changing last_order_date to most_recent_order later will break segments and liquid references.
  5. Set sync frequency by use case.
    • Cart recovery: aim for minutes, not hours (or keep cart events in Track/Web SDK and reserve BigQuery for enrichment).
    • Reactivation/LTV: daily is usually fine, as long as timestamps are correct.
  6. Validate in Customer.io before you turn on journeys.
    • Spot-check 10 real customers: profile attributes, latest events, and timestamps.
    • Build a test segment that should match a known set (e.g., “purchased in last 30 days”) and compare counts to BigQuery.

When Should You Use This Feature

BigQuery → Customer.io data-in is the right move when your retention targeting depends on modeled or multi-source data that your app tracking can’t reliably provide. It’s especially valuable when you need consistency across email/SMS suppression, VIP logic, and churn prevention.

  • Repeat purchase / replenishment: sync last_purchase_date, days_since_purchase, and category-level purchase history so replenishment journeys don’t rely on brittle “product viewed” logic.
  • Reactivation based on profitability: segment by gross_margin_ltv or return_rate so you’re not spending SMS on low-value churn risks.
  • Cart recovery with smarter suppression: keep cart events real-time, but use BigQuery to feed “already purchased since cart” or “high refund risk” flags to prevent awkward sends.
  • D2C scenario: A skincare brand has customers who buy a cleanser every ~45 days. BigQuery calculates predicted_next_order_date and routine_type (from quiz + order history). Customer.io uses those fields to trigger a replenishment series only when the customer is overdue and inventory is available—no more blasting everyone at day 30.

Operational Considerations

This is where most teams get burned: the data technically flows, but segments don’t match expectations and journeys enroll the wrong people. Treat your BigQuery → Customer.io pipeline like production infrastructure, not a one-time integration task.

  • Segmentation accuracy depends on “one row per person.” If your people sync accidentally creates multiple profiles (same email, different IDs), suppression and frequency capping stop working the way you think they do.
  • Timestamp consistency is non-negotiable. Standardize on UTC (or be extremely explicit) and store timestamps as timestamps. “Within the past X days” logic is sensitive to timezone drift and string parsing.
  • Event duplication creates phantom performance. If order_completed lands twice, your conversion reporting inflates and post-purchase journeys can double-send. Build dedupe upstream using an event key.
  • Schema changes break orchestration quietly. Renaming a column, changing a type (number → string), or nesting JSON differently can cause segments to drop to zero or journeys to stop enrolling—often without an obvious alert.
  • Decide what’s real-time vs batch. In practice, this tends to break when teams try to do cart/browse abandonment purely from warehouse data; by the time it lands, the customer already converted or moved on.

Implementation Checklist

If you want this to drive revenue without creating messaging chaos, you need a tight checklist that covers identity, mapping, and verification—not just “data is syncing.”

  • Canonical person identifier chosen and documented (id strategy).
  • BigQuery people view is one row per person; no duplicates by identifier.
  • BigQuery events view includes: person identifier, event name, event timestamp, event properties, and an event dedupe key.
  • Field names and types match what Customer.io segments/journeys will reference.
  • Timezone and timestamp formatting validated with real profiles.
  • Test segments built and reconciled against BigQuery counts.
  • At least one test journey triggered end-to-end using synced data.
  • Monitoring plan for volume drops/spikes and schema changes.

Expert Implementation Tips

The best-performing retention stacks treat warehouse-fed data as “audience infrastructure.” The goal is stable, explainable inputs so you can iterate on offers and creative without wondering if the audience is broken.

  • Separate behavioral events from modeled attributes. Keep web/app events (viewed product, added to cart) real-time; use BigQuery for computed fields (LTV tiers, churn risk, replenishment windows).
  • Create “operator-friendly” attributes. Instead of sending raw tables, sync fields like is_vip, is_churn_risk, next_replenishment_window_start. Your future self will thank you when building segments fast.
  • Version your event names. If you need to change payload shape, add order_completed_v2 rather than mutating the original event mid-flight.
  • Use suppression attributes proactively. A single boolean like do_not_sms or cs_open_ticket (from support systems via BigQuery) prevents a lot of brand damage during heavy promo periods.

Common Mistakes to Avoid

Most “Customer.io isn’t working” issues are really data-contract issues. These are the ones that show up repeatedly when BigQuery becomes the source of truth.

  • Sending multiple identifiers without a clear priority. If some rows use internal ID and others use email, you’ll fragment profiles and lose journey continuity.
  • Trying to warehouse-sync carts as a batch process. Cart recovery needs speed; batch carts often arrive after purchase and trigger awkward “you left something behind” messages.
  • Overloading Customer.io with raw JSON blobs. If an attribute can’t be segmented on cleanly, it won’t help retention ops day-to-day.
  • Ignoring dedupe. Duplicate order_completed is the fastest way to double-send post-purchase and corrupt reporting.
  • Changing column names/types without updating segments. Segments don’t “fail loudly”—they just stop matching people.

Summary

BigQuery → Customer.io works best when you treat it as a data contract: stable identifiers, clean timestamps, and a deliberate split between events and attributes. Get that right and your segments stay trustworthy, triggers stay reliable, and retention iterations move faster.

Implement Bigquery Reverse Etl with Propel

If BigQuery is already your source of truth, the next step is making that truth usable inside Customer.io without constant debugging. When teams implement this well, they spend more time improving offers and journeys—and less time chasing why a segment dropped to zero.

If you want help pressure-testing identity mapping, event design, and sync cadence for your retention use cases, book a strategy call and we’ll walk through a practical setup that won’t break the next time your schema changes.

Contact us

Get in touch

Our friendly team is always here to chat.

Here’s what we’ll dig into:

Where your lifecycle flows are underperforming and the revenue you’re missing

How AI-driven personalisation can move the needle on retention and LTV

Quick wins your team can action this quarter

Whether Propel AI is the right fit for your brand, stage, and stack