Data In: How to get reliable people + event data into Customer.io

Customer.io partner logo

Table of Contents

Summarize this documentation using AI

This banner was added using fs-inject

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Overview

If you want retention automations to behave, you need dependable data flowing into Customer.io—not “mostly correct” events that break segmentation and misfire triggers. If you’re seeing weird campaign entry counts, duplicate profiles, or cart flows that miss obvious abandoners, it’s usually a data-in problem worth fixing before you touch creative; if you want a second set of eyes on your tracking plan, you can book a strategy call.

In most retention programs, the highest leverage work is getting identity and event semantics right: one person, one timeline, consistent properties. That’s what keeps cart recovery, replenishment, and winback triggers firing when they should—and only when they should.

How It Works

Customer.io’s automations and segments are only as good as the inputs: person profiles (attributes) and events (behavior). Your job is to make sure both arrive with consistent IDs and predictable schemas so Customer.io can resolve identity, evaluate segment membership, and trigger journeys without ambiguity.

  • People (profiles) are the anchor. A “person” record holds attributes like email, phone, acquisition source, first order date, total orders, and consent flags. Journeys frequently filter on these attributes (e.g., “has_sms_consent = true” or “orders_count >= 2”).
  • Events are the timeline. Events like Product Viewed, Added to Cart, Checkout Started, and Order Completed drive triggers and branching. Each event should carry the properties needed to make decisions (SKU, cart value, currency, items array, discount code, etc.).
  • Identity resolution is where most teams get burned. If anonymous browsing activity isn’t merged into the known profile at the right moment (typically at email capture, login, or checkout), Customer.io can’t connect “Added to Cart” to the same person who later becomes “jane@brand.com.” That’s how you end up with cart abandon flows that undercount or send to the wrong person.
  • Mapping + naming consistency drives segmentation accuracy. If one system sends order_total and another sends total, you’ll build segments that silently exclude people. Standardize event names and property keys early, then enforce them.
  • Trigger reliability depends on event timing. A cart journey typically assumes Added to Cart arrives immediately and Order Completed arrives fast enough to suppress sends. If your order event lags by 30–60 minutes, you’ll send “Still thinking?” emails to people who already bought.

Step-by-Step Setup

Before you wire anything, write down the minimum set of retention-critical events and the identity rules you’ll follow. In practice, teams move faster when they treat this like a tracking contract: same names, same required properties, same IDs—every time.

  1. Define your identity strategy (the non-negotiable part).
    • Pick a stable primary identifier (typically email for D2C; sometimes a customer_id from Shopify/your backend).
    • Decide when anonymous becomes known (email capture pop-up, account creation, checkout step 1, etc.).
    • Document the merge rule: anonymous activity should merge into the known profile at identification time, not create a second “shadow” customer.
  2. Choose your data-in method per source.
    • Use direct integrations where they’re strong (e.g., ecommerce platform + payments) and APIs/SDKs for custom behavior and site/app events.
    • Keep one “source of truth” for orders/refunds so revenue-based segments don’t drift.
  3. Standardize your retention event taxonomy.
    • Core ecommerce events most D2C brands need: Product Viewed, Added to Cart, Checkout Started, Order Completed, Order Refunded.
    • Decide required properties for each event (e.g., for Added to Cart: cart_id, value, currency, items with SKU + quantity).
    • Lock naming (including capitalization and spacing). “Order Completed” vs “order_completed” becomes a segmentation bug later.
  4. Map person attributes you’ll actually use for retention.
    • Examples: orders_count, lifetime_value, last_order_at, first_order_at, sms_consent, email_consent, preferred_category.
    • Make sure types are consistent (timestamps as timestamps, numbers as numbers). Mixed types break “within past X days” logic.
  5. Implement identification + event sending in the right order.
    • When a user becomes known, identify/update the person first, then send events tied to that identity.
    • If you must send events pre-identification, confirm you’re merging anonymous activity into the identified profile (otherwise those events won’t qualify the person for journeys).
  6. Validate in Customer.io with real journeys/segments (not just logs).
    • Create a test segment like “Added to Cart in last 1 hour AND no Order Completed in last 1 hour.”
    • Run a real test: add to cart, wait 2 minutes, purchase, refund—confirm segment membership changes as expected.

When Should You Use This Feature

You’ll feel the need for tighter Data In the moment retention performance looks “random”—the flow logic is fine, but the audience is wrong. Most of the time, it shows up as missed triggers, duplicate profiles, or suppression failures.

  • Cart recovery that underperforms despite good creative. Classic scenario: a shopper adds a bundle to cart on mobile, later checks out on desktop. If identity isn’t merged, Customer.io sees two people—so your abandon series hits the wrong profile or doesn’t fire at all.
  • Repeat purchase programs that can’t time replenishment. If last_order_at isn’t reliably updated (or timestamps come in as strings), “30 days since last purchase” segments won’t hold and replenishment reminders will drift.
  • Reactivation that accidentally targets recent buyers. If Order Completed events lag or refunds/chargebacks aren’t represented, you’ll misclassify customers and burn trust with irrelevant winbacks.
  • VIP/CLV segmentation that doesn’t match finance. If orders come from multiple systems and you’re double-counting or missing adjustments, your “VIP” segment becomes a guess instead of an asset.

Operational Considerations

Once data is flowing, the work shifts to keeping it stable as the site, apps, and tools evolve. In practice, this tends to break during platform migrations, checkout changes, new subscription apps, or when someone “just adds one property” without updating downstream segments.

  • Segmentation correctness depends on strict schemas. Treat event/property names like APIs: version them, document them, and don’t change them casually. If you must change, dual-send old + new for a transition period.
  • Orchestration needs deterministic suppression signals. Cart recovery, post-purchase, and winback all rely on clean suppression (e.g., “exit if Order Completed”). If your purchase event is delayed, add a buffer delay before first send or pull order status from the most reliable source.
  • Duplicate people poison performance reporting. If the same customer exists as multiple profiles (email variations, phone-only, anonymous-only), you’ll see inflated audience sizes and messy conversion attribution. Plan for periodic dedupe checks, not just one-time cleanup.
  • Anonymous-to-known merging is a retention lever. Email capture is only valuable if it stitches browsing/cart intent into the identified profile. Otherwise, you’re collecting emails that don’t qualify for behavior-based journeys.
  • Data latency is a real constraint. Be explicit about acceptable delays per event type (cart events: seconds; order completion: ideally minutes). Build journey timing around reality, not hope.

Implementation Checklist

If you want this to hold up over time, treat the checklist as your “definition of done” before you scale spend or start heavy A/B testing. Otherwise you’ll optimize messaging on top of shaky inputs.

  • Primary identifier chosen (email and/or customer_id) and consistently sent
  • Anonymous identification moment defined and implemented (capture/login/checkout)
  • Anonymous activity merge behavior verified with a real test user
  • Retention-critical events implemented with standardized names
  • Required event properties documented and enforced (SKU, value, currency, items, cart_id/order_id)
  • Order/refund source of truth established (no double-sending from multiple systems)
  • Person attributes mapped with correct data types (timestamps/numbers/booleans)
  • At least 3 validation segments built to confirm behavior (cart abandon, recent purchaser, lapsed)
  • Latency expectations documented and reflected in journey delays/suppressions
  • Ongoing monitoring plan (weekly spot checks + alerts for schema drift)

Expert Implementation Tips

The difference between “it’s integrated” and “it drives revenue” is usually a handful of operator decisions that keep audiences clean and triggers dependable.

  • Send one canonical purchase event. If Shopify, your backend, and a subscription tool all emit “order completed,” pick one to be canonical and have the others enrich via attributes or separate event names.
  • Make cart identity explicit. Include cart_id and items on cart events so you can suppress or branch on “cart changed” vs “same cart still open.” This is how you avoid sending a recovery message for items they removed.
  • Store both computed and raw values. Keep order_total and also store subtotal, discount, shipping, tax when possible. It makes VIP logic and offer testing far less brittle.
  • Use timestamps you control. Prefer server-side timestamps for purchases over client-side to avoid time zone/device clock weirdness—especially when building “within past X hours” segments.
  • Build a “data quality” segment. Example: “Added to Cart in last 7 days AND email is blank” to quantify how much intent you’re failing to identify (and whether your capture/merge is working).

Common Mistakes to Avoid

Most retention issues blamed on creative are actually caused by sloppy event semantics or identity gaps. These are the ones we see repeatedly in D2C.

  • Different event names for the same behavior. “Checkout Started” vs “Begin Checkout” splits your audience and makes triggers unreliable.
  • Sending purchase events late (or not at all) during peak traffic. If your order pipeline retries or batches, your cart recovery will hit buyers unless you add a delay/suppression buffer.
  • Creating new profiles instead of merging. Email captured in a pop-up creates a person, but cart events remain on an anonymous profile—so the person never qualifies for the abandon journey.
  • Overloading person attributes with event-like data. Don’t store “last_product_viewed” as a single attribute if you need real browsing history; send events so segments can evaluate recency/frequency properly.
  • Inconsistent data types. A timestamp sometimes sent as ISO string and sometimes as Unix epoch will quietly break “within past X days” conditions.

Summary

If your data-in layer is clean, Customer.io becomes predictable: segments match reality and triggers fire when they should.

Prioritize identity resolution, canonical event naming, and purchase/refund accuracy before you scale retention journeys.

Implement Page Spec with Propel

If you’re tightening up Data In and want it to survive platform changes, treat the tracking plan like an operational spec: identifiers, event contracts, required properties, and validation segments. That’s the work that keeps Customer.io performing quarter after quarter—especially for cart recovery and repeat purchase flows where timing and suppression matter.

If you want help pressure-testing your identity + event schema against your retention roadmap, you can book a strategy call and we’ll walk through where segmentation and triggers typically break in real D2C stacks.

Contact us

Get in touch

Our friendly team is always here to chat.

Here’s what we’ll dig into:

Where your lifecycle flows are underperforming and the revenue you’re missing

How AI-driven personalisation can move the needle on retention and LTV

Quick wins your team can action this quarter

Whether Propel AI is the right fit for your brand, stage, and stack