Summarize this documentation using AI
Overview
If you’re running retention in Customer.io, BigQuery is usually where the “real truth” lives: orders, returns, subscriptions, inventory, margins, and all the modeled fields your team actually trusts. The practical goal is simple—get that BigQuery truth into Customer.io in a way that keeps segments accurate and triggers firing consistently; if you want a second set of eyes on the mapping, book a strategy call.
In most retention programs, BigQuery becomes the source for high-leverage audiences like “high intent, no purchase,” “replenishment overdue,” or “VIP at churn risk.” The difference between a great program and a noisy one is almost always identity resolution and event hygiene—not message copy.
How It Works
BigQuery itself doesn’t “send campaigns”—it feeds Customer.io with people attributes and events that journeys can reliably trigger off. The retention win comes when the data arrives with the right identifiers, consistent timestamps, and a stable schema so segments don’t drift and automations don’t silently stop enrolling.
- Two data shapes matter:
- Person attributes (e.g.,
lifetime_value,last_order_date,subscription_status,predicted_next_order_date) to power segmentation and message personalization. - Events (e.g.,
order_completed,cart_abandoned,return_initiated,back_in_stock) to trigger journeys and measure conversion cleanly.
- Person attributes (e.g.,
- Identity resolution is the make-or-break: Customer.io needs a stable person identifier (commonly
id) and/or a consistent primary key like email/phone. If BigQuery rows don’t map deterministically to a single profile, you’ll see duplicate people, broken suppression logic, and “why did they get this twice?” complaints. - Data mapping drives trigger reliability: If your BigQuery export calls something
orderDatebut your journey expectsorder_date, you don’t get a warning—you get a dead trigger. Same story with timestamps: wrong timezone or string formatting can push people into (or out of) “within last X days” segments. - Sync cadence controls freshness: Cart recovery and browse abandonment need near-real-time events; LTV, churn risk, and replenishment can tolerate hourly/daily updates. Mixing these without a plan is how you end up with “cart abandoned” firing 6 hours late.
Step-by-Step Setup
Before you wire anything up, decide what BigQuery is responsible for (source of truth) versus what your app/web tracking is responsible for (real-time behavioral events). That division prevents duplicate events and keeps your audiences explainable.
- Pick your canonical identifier.
- Decide what Customer.io’s
idwill be (Shopify customer ID, internal user ID, etc.). - Ensure every BigQuery row you plan to sync includes that identifier consistently.
- If you must rely on email, normalize it (lowercase, trimmed) and treat it as a key—don’t let multiple emails map to the same person without a merge strategy.
- Decide what Customer.io’s
- Define your “people” table/view for Customer.io.
- Create a BigQuery view that is one row per person.
- Include only attributes you will actually segment on or personalize with (avoid dumping 200 columns “just in case”).
- Standardize types: booleans as true/false, timestamps as timestamps, numbers as numbers.
- Define your “events” table/view for Customer.io.
- Create a BigQuery view that is one row per event with: person identifier, event name, event timestamp, and event properties.
- Use a stable event naming convention (e.g.,
order_completednotOrder Completed). - Include an event-level unique key (e.g.,
event_idororder_id+ type) to support deduping upstream if needed.
- Map fields to Customer.io’s expected schema.
- Confirm which fields become person attributes vs event properties.
- Lock the names—changing
last_order_datetomost_recent_orderlater will break segments and liquid references.
- Set sync frequency by use case.
- Cart recovery: aim for minutes, not hours (or keep cart events in Track/Web SDK and reserve BigQuery for enrichment).
- Reactivation/LTV: daily is usually fine, as long as timestamps are correct.
- Validate in Customer.io before you turn on journeys.
- Spot-check 10 real customers: profile attributes, latest events, and timestamps.
- Build a test segment that should match a known set (e.g., “purchased in last 30 days”) and compare counts to BigQuery.
When Should You Use This Feature
BigQuery → Customer.io data-in is the right move when your retention targeting depends on modeled or multi-source data that your app tracking can’t reliably provide. It’s especially valuable when you need consistency across email/SMS suppression, VIP logic, and churn prevention.
- Repeat purchase / replenishment: sync
last_purchase_date,days_since_purchase, and category-level purchase history so replenishment journeys don’t rely on brittle “product viewed” logic. - Reactivation based on profitability: segment by
gross_margin_ltvorreturn_rateso you’re not spending SMS on low-value churn risks. - Cart recovery with smarter suppression: keep cart events real-time, but use BigQuery to feed “already purchased since cart” or “high refund risk” flags to prevent awkward sends.
- D2C scenario: A skincare brand has customers who buy a cleanser every ~45 days. BigQuery calculates
predicted_next_order_dateandroutine_type(from quiz + order history). Customer.io uses those fields to trigger a replenishment series only when the customer is overdue and inventory is available—no more blasting everyone at day 30.
Operational Considerations
This is where most teams get burned: the data technically flows, but segments don’t match expectations and journeys enroll the wrong people. Treat your BigQuery → Customer.io pipeline like production infrastructure, not a one-time integration task.
- Segmentation accuracy depends on “one row per person.” If your people sync accidentally creates multiple profiles (same email, different IDs), suppression and frequency capping stop working the way you think they do.
- Timestamp consistency is non-negotiable. Standardize on UTC (or be extremely explicit) and store timestamps as timestamps. “Within the past X days” logic is sensitive to timezone drift and string parsing.
- Event duplication creates phantom performance. If
order_completedlands twice, your conversion reporting inflates and post-purchase journeys can double-send. Build dedupe upstream using an event key. - Schema changes break orchestration quietly. Renaming a column, changing a type (number → string), or nesting JSON differently can cause segments to drop to zero or journeys to stop enrolling—often without an obvious alert.
- Decide what’s real-time vs batch. In practice, this tends to break when teams try to do cart/browse abandonment purely from warehouse data; by the time it lands, the customer already converted or moved on.
Implementation Checklist
If you want this to drive revenue without creating messaging chaos, you need a tight checklist that covers identity, mapping, and verification—not just “data is syncing.”
- Canonical person identifier chosen and documented (
idstrategy). - BigQuery people view is one row per person; no duplicates by identifier.
- BigQuery events view includes: person identifier, event name, event timestamp, event properties, and an event dedupe key.
- Field names and types match what Customer.io segments/journeys will reference.
- Timezone and timestamp formatting validated with real profiles.
- Test segments built and reconciled against BigQuery counts.
- At least one test journey triggered end-to-end using synced data.
- Monitoring plan for volume drops/spikes and schema changes.
Expert Implementation Tips
The best-performing retention stacks treat warehouse-fed data as “audience infrastructure.” The goal is stable, explainable inputs so you can iterate on offers and creative without wondering if the audience is broken.
- Separate behavioral events from modeled attributes. Keep web/app events (viewed product, added to cart) real-time; use BigQuery for computed fields (LTV tiers, churn risk, replenishment windows).
- Create “operator-friendly” attributes. Instead of sending raw tables, sync fields like
is_vip,is_churn_risk,next_replenishment_window_start. Your future self will thank you when building segments fast. - Version your event names. If you need to change payload shape, add
order_completed_v2rather than mutating the original event mid-flight. - Use suppression attributes proactively. A single boolean like
do_not_smsorcs_open_ticket(from support systems via BigQuery) prevents a lot of brand damage during heavy promo periods.
Common Mistakes to Avoid
Most “Customer.io isn’t working” issues are really data-contract issues. These are the ones that show up repeatedly when BigQuery becomes the source of truth.
- Sending multiple identifiers without a clear priority. If some rows use internal ID and others use email, you’ll fragment profiles and lose journey continuity.
- Trying to warehouse-sync carts as a batch process. Cart recovery needs speed; batch carts often arrive after purchase and trigger awkward “you left something behind” messages.
- Overloading Customer.io with raw JSON blobs. If an attribute can’t be segmented on cleanly, it won’t help retention ops day-to-day.
- Ignoring dedupe. Duplicate
order_completedis the fastest way to double-send post-purchase and corrupt reporting. - Changing column names/types without updating segments. Segments don’t “fail loudly”—they just stop matching people.
Summary
BigQuery → Customer.io works best when you treat it as a data contract: stable identifiers, clean timestamps, and a deliberate split between events and attributes. Get that right and your segments stay trustworthy, triggers stay reliable, and retention iterations move faster.
Implement Bigquery Reverse Etl with Propel
If BigQuery is already your source of truth, the next step is making that truth usable inside Customer.io without constant debugging. When teams implement this well, they spend more time improving offers and journeys—and less time chasing why a segment dropped to zero.
If you want help pressure-testing identity mapping, event design, and sync cadence for your retention use cases, book a strategy call and we’ll walk through a practical setup that won’t break the next time your schema changes.