Summarize this documentation using AI
Overview
If your retention program lives and dies by accurate segments, then your MySQL tables are either your biggest advantage—or the reason campaigns “randomly” miss people. When you pipe MySQL into Customer.io, the goal isn’t just getting rows across; it’s getting identity, timestamps, and event semantics consistent enough that cart recovery, replenishment, and winback triggers fire every time. If you want a second set of eyes on your data mapping before you scale sends, book a strategy call.
In most D2C stacks, MySQL is the source of truth for orders, customers, subscriptions, and sometimes carts. The retention unlock is turning those tables into Customer.io people attributes and events that segment cleanly and trigger journeys predictably.
How It Works
At a high level, you’re taking structured data in MySQL and syncing it into Customer.io as either (1) updates to a person profile (attributes) or (2) behavioral records (events). The difference matters operationally: attributes are great for segmentation and state, while events are what you want for precise triggers and timing.
- Identity resolution comes first. Every sync needs a stable identifier that Customer.io can match to a single person. In practice, that usually means syncing by
email(if it’s always present and verified) or an internalcustomer_idthat you store as the Customer.ioid. If you mix identifiers, you’ll create duplicates and your suppression/consent logic will drift. - Map MySQL fields into two buckets:
- Person attributes (state): e.g.,
total_orders,last_order_at,vip_tier,subscription_status,sms_opt_in. - Events (behavior): e.g.,
order_placed,cart_updated,subscription_canceled. Events should include a timestamp and the properties you’ll use in message logic (SKU, category, cart value, etc.).
- Person attributes (state): e.g.,
- Timestamp hygiene drives trigger reliability. Customer.io segments and “within the past X days” logic depend on consistent time formats and time zones. If your MySQL timestamps are mixed (UTC vs local) or nullable, you’ll see people enter late, re-enter, or never qualify.
- Deduping is your job, not Customer.io’s. If your MySQL query emits multiple rows per person for what should be a single attribute state (like
last_order_at), you’ll overwrite attributes unpredictably. For events, duplicates will inflate conversion attribution and cause repeated sends unless you build idempotency upstream (e.g., unique event IDs).
Real D2C scenario: You want a “replenishment” journey that triggers 21 days after a customer’s first purchase of your collagen powder. If last_order_at is updated by an old historical backfill job, your segment will think they purchased “recently” and they’ll never enter the journey. If you instead send an order_placed event with order_id, sku, and purchased_at, you can trigger off the event timestamp and avoid the attribute overwrite trap.
Step-by-Step Setup
The cleanest implementations start with a mapping doc and a minimal set of “retention-critical” fields. Get the fundamentals right first (identity + 5–10 key attributes + 2–4 key events), then expand once you trust your segments.
- Pick your canonical identifier.
- Decide whether Customer.io’s primary key will be
emailor your internalcustomer_id. - Make sure the identifier exists in every MySQL dataset you plan to sync (orders, carts, subscriptions).
- Decide whether Customer.io’s primary key will be
- Define your retention-critical schema.
- Attributes:
first_order_at,last_order_at,total_orders,lifetime_value,last_product_category,sms_opt_in,email_consent. - Events:
cart_updated,checkout_started,order_placed,refund_issued,subscription_canceled.
- Attributes:
- Write MySQL queries that emit one row per person for attributes.
- Use aggregates/window functions so each person has a single authoritative value per attribute.
- Example pattern:
MAX(order_date)forlast_order_at,COUNT(*)fortotal_orders,SUM(order_total)for LTV.
- Write MySQL queries that emit one row per event instance for events.
- Include a stable
event_id(often the source primary key likeorder_idorcart_id+ timestamp) to prevent duplicates. - Include
occurred_atand properties you’ll actually use (SKU, category, cart_value, discount_code).
- Include a stable
- Sync into Customer.io with a consistent naming convention.
- Attributes: snake_case, no ambiguity (e.g.,
last_order_atnotlast_purchase). - Events: verb_noun (e.g.,
order_placed,cart_abandonedif you compute it).
- Attributes: snake_case, no ambiguity (e.g.,
- Validate in Customer.io before you turn on automations.
- Spot-check 20 real customers: confirm profile attributes match your warehouse truth.
- Confirm event counts and timestamps look right (no future dates, no 1970 timestamps, no duplicates).
When Should You Use This Feature
MySQL → Customer.io syncs earn their keep when your retention logic depends on data that isn’t reliably available from front-end tracking alone. If you’re running serious post-purchase and winback programs, you’ll eventually need server-side truth for orders, refunds, subscriptions, and inventory-adjacent signals.
- Cart recovery that doesn’t lie. If your “Added to Cart” pixel misses Safari users or iOS app sessions, syncing carts from MySQL gives you a consistent abandonment trigger.
- Repeat purchase and replenishment. Trigger journeys off
order_placedevents and SKU/category properties so you can set different cadences for skincare vs supplements. - Reactivation based on real purchase recency. Build segments off
last_order_atandtotal_ordersso you can separate “lapsed one-timers” from “formerly loyal” customers. - Refund-aware suppression. If someone refunded their first order, you usually don’t want them in your standard cross-sell flow. MySQL is where that truth sits.
Operational Considerations
This is where most retention programs quietly break: the data technically syncs, but segments drift and triggers misfire because the underlying model wasn’t built for orchestration.
- Segmentation depends on stable attribute semantics. Decide what
last_order_atmeans (paid? fulfilled? shipped?) and never change it midstream without versioning. Otherwise your “30-day lapsed” segment becomes untrustworthy overnight. - Backfills can accidentally trigger campaigns. If you replay historical orders as fresh
order_placedevents, you’ll dump thousands of people into post-purchase journeys. In practice, you either (a) backfill as attributes only, or (b) backfill events with historical timestamps and add campaign guardrails (e.g., only trigger when occurred_at is within last 1 day). - Identity conflicts create duplicates and suppression leaks. If you sometimes identify by email and other times by customer_id, you’ll end up messaging the same human twice—or messaging someone who opted out on the “other” profile.
- Event volume and cardinality matter. Cart events can be noisy (every quantity change). Decide whether you want raw events (
cart_updated) or a computed event (cart_abandoned) emitted once per cart session to keep journeys sane. - Orchestration reality: Customer.io needs deterministic inputs. If your MySQL data arrives late (e.g., orders written hours after checkout), your cart recovery timing will be off. Fix the upstream write timing or adjust journey delays to match reality.
Implementation Checklist
If you want segments you can trust, treat this like a data product: define the contract, validate it, and monitor it. This checklist is the minimum I’d want before scaling sends.
- Canonical identifier chosen and used everywhere (email or customer_id)
- Attribute dictionary defined (names, types, definitions, source tables)
- Event dictionary defined (event names, required properties, timestamp field, idempotency key)
- All timestamps normalized to UTC (or consistently defined) and validated
- One-row-per-person logic for attribute sync queries
- Deduping/idempotency strategy for event syncs (unique event_id)
- Backfill plan that won’t accidentally trigger live journeys
- Spot-checks completed on real customers (profiles + recent events)
- Segments created for QA (e.g., “orders last 24h”, “carts last 2h”)
- Ongoing monitoring plan (daily deltas, null rate checks, duplicate checks)
Expert Implementation Tips
Once the basics are in, the wins come from making your data model journey-friendly. That usually means fewer, higher-quality events and attributes that represent customer state cleanly.
- Prefer “state attributes” for suppression and eligibility. Things like
is_refunder,subscription_status, andvip_tiershould be attributes so every campaign can reference them without replaying event history. - Use events for moments in time that drive messaging.
order_placedandsubscription_canceledare perfect triggers. Don’t try to trigger off an attribute that changes frequently unless you’re okay with edge cases. - Compute one “cart_abandoned” event instead of reacting to every cart change. In most D2C brands, raw cart updates are too noisy. A single abandonment event per cart (with cart_value and top_items) makes recovery flows cleaner and easier to attribute.
- Version your naming when definitions change. If you redefine LTV to exclude refunds, create
lifetime_value_netinstead of overwritinglifetime_value. It prevents silent segment breakage.
Common Mistakes to Avoid
Most “Customer.io isn’t working” issues are really “our data contract is inconsistent.” These are the ones I see repeatedly when teams sync from MySQL.
- Mixing identifiers across sources. Orders keyed by customer_id, carts keyed by email, subscriptions keyed by phone—this is how you get duplicate people and broken suppression.
- Overwriting
last_order_atwith the wrong timestamp. Usingcreated_atfor orders that later fail payment will inflate recency and suppress winbacks. - Backfilling events into live triggers. Historical imports accidentally firing post-purchase and replenishment journeys is a classic self-inflicted incident.
- Sending high-cardinality junk into attributes. Don’t store entire cart JSON blobs as a single attribute if you need to segment on it. Either normalize into event properties or store a few queryable fields (cart_value, item_count, top_sku).
- Assuming “near real-time” when your pipeline is batch. If your sync runs every 6 hours, a 30-minute cart recovery email will miss the window. Align journey timing to actual data latency.
Summary
If MySQL is your truth, sync it into Customer.io like you’re building a retention-grade data contract: one identity, clean timestamps, and a clear split between state (attributes) and behavior (events). When you do that, segments stop drifting and triggers become predictable—which is what actually lifts repeat purchase and reactivation.
Implement Sql Reverse Etl with Propel
If you’re pulling retention-critical data out of MySQL, the hard part is rarely the connection—it’s keeping identity, deduping, and field definitions stable as the business evolves. In practice, that’s where most teams want an operator-led setup and monitoring layer so journeys in Customer.io keep firing reliably as you add SKUs, channels, and new data sources. If you want help pressure-testing your mapping and rollout plan, book a strategy call.