Understanding Incoming Data in Customer.io (for retention operators)

Customer.io partner logo

Table of Contents

Summarize this documentation using AI

This banner was added using fs-inject

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Overview

If you’re running retention in Customer.io, incoming data is the difference between “Journeys that print money” and “Journeys that randomly miss people.” If you want a second set of eyes on your event taxonomy, identity rules, and trigger reliability, book a strategy call—this is usually where performance quietly leaks.

At a practical level, incoming data is everything you send into Customer.io: people (profiles + attributes) and events (behavior). Retention programs live and die on whether those inputs are consistent, mapped correctly, and tied to the right person at the right time.

How It Works

Customer.io doesn’t “guess” what happened in your store—your systems tell it. In most D2C stacks, that means your ecommerce platform, your site/app tracking, and your backend (orders, subscriptions, fulfillment) are all pushing data in. The job is to make sure those streams resolve to a single customer profile and produce events that your segments and triggers can trust.

  • People are the anchor. A person profile is identified by an identifier (commonly an email, or an internal customer ID). Attributes (like first_name, customer_type, last_order_at, lifetime_value) attach to that profile.
  • Events are the fuel. Events represent actions (like product_viewed, added_to_cart, checkout_started, order_completed). Events can include properties (SKU, cart value, category) that power personalization and branching.
  • Identity resolution decides “who did this.” If a shopper browses anonymously and later enters an email at checkout, you need a plan for merging that anonymous activity into the known profile. In practice, this tends to break when teams track “email” sometimes, “customer_id” other times, and never reconcile the two.
  • Segmentation depends on consistent types and timestamps. If last_order_at is sometimes a string and sometimes a timestamp, your “Purchased in last 30 days” segment will silently misbehave. Same story for event time—if you backfill orders without setting the right event timestamp, recency logic becomes unreliable.
  • Triggers fire off specific incoming signals. A cart recovery journey is only as good as the added_to_cart / checkout_started events (and the absence of order_completed after). If those events arrive late, duplicated, or under the wrong profile, you’ll spam buyers or miss abandoners.

Step-by-Step Setup

The goal here isn’t “send more data.” It’s to send the minimum clean set of people + events that make your retention automations deterministic. Start with one high-impact flow (cart recovery or post-purchase) and build outward.

  1. Pick your primary identifier and stick to it. Decide what Customer.io should treat as the canonical key (email vs internal customer ID). If you use multiple identifiers across tools, document how they map.
  2. Define your event taxonomy for retention. Lock names and required properties for core events:
    • product_viewed (sku, product_id, category)
    • added_to_cart (sku(s), quantity, cart_value, currency)
    • checkout_started (cart_value, items, checkout_id)
    • order_completed (order_id, total, items, discount_code, is_subscription)
  3. Map person attributes that power segmentation. Keep these tight and operational:
    • first_order_at, last_order_at
    • orders_count
    • lifetime_value
    • subscriber_status (if you run subscriptions)
    • acquisition_source (optional, but useful for winback targeting)
  4. Handle anonymous-to-known merging intentionally. Decide where the “identify” moment happens (email capture, account creation, checkout). Make sure anonymous events can be attached to the known person once identified—otherwise your browse/cart history disappears right when it matters.
  5. Validate data types and timestamps. Ensure timestamps are in a consistent format and represent when the behavior happened (not when your integration processed it). This is critical for “within the last X days” segments and delay logic.
  6. QA with real profiles, not test-only events. Create a small QA cohort (internal emails). Run through: view product → add to cart → start checkout → purchase. Confirm the profile shows the right events in the right order and that segments update as expected.

When Should You Use This Feature

“Incoming data” isn’t a single feature—it’s the foundation for every retention workflow you’ll run. The best time to tighten it up is right before you scale automations or when you notice weird edge cases (missed triggers, duplicate sends, buyers getting cart emails).

  • Cart recovery that actually suppresses purchasers. If order_completed arrives late or under a different identifier, you’ll send abandonment emails to people who already bought.
  • Post-purchase cross-sell based on what they bought. You need clean order_completed item data (SKU/category) to recommend the right next product and avoid irrelevant upsells.
  • Repeat purchase timing (replenishment). This depends on accurate purchase timestamps and product-level properties (e.g., “90-day supply”). Without that, your replenishment journey becomes guesswork.
  • Winback/reactivation that targets true lapsers. Your “no purchase in 120 days” segment must be built on reliable last_order_at or purchase events—or you’ll either miss churned customers or discount active ones.

Real D2C scenario: A skincare brand runs a 2-step cart recovery (1 hour + 20 hours). They track checkout_started from Shopify but order_completed comes from a backend job that posts once every 6 hours. Result: buyers get the 20-hour “Still thinking it over?” email. Fixing incoming data timing (or using a more real-time purchase event source) cleans this up immediately.

Operational Considerations

This is where retention programs usually get messy: multiple data sources, inconsistent IDs, and “close enough” mappings that create unreliable segments. Treat data flow like production infrastructure—because your revenue automations depend on it.

  • Segmentation accuracy depends on a single source of truth. If last_order_at is updated by two systems (Shopify + subscription platform), you’ll get race conditions. Pick one owner for each attribute.
  • Event duplication will inflate frequency and break logic. Double-sent added_to_cart events can restart journeys or re-qualify people unexpectedly. Deduplicate using stable IDs (cart_id, checkout_id, order_id) where possible.
  • Latency changes customer experience. A cart event that arrives 30 minutes late means your “send in 15 minutes” message hits 45 minutes after abandonment. That’s often the difference between recovery and irrelevance.
  • Orchestration across channels needs consistent consent + identifiers. If SMS consent lives on the person profile but email is the identifier used for events, make sure the same person record holds both—otherwise your “email then SMS” sequence won’t behave.
  • Backfills can wreck recency-based segments. When importing historical orders/events, ensure timestamps reflect historical reality. Otherwise, a bulk import can accidentally drop thousands of customers into “Purchased today” segments.

Implementation Checklist

Before you trust any retention journey at scale, run this checklist once. It’s faster than debugging after you’ve already sent 50,000 wrong messages.

  • Primary identifier defined (and documented) across all sources
  • Anonymous-to-known merge plan implemented and tested
  • Core retention events standardized (names + required properties)
  • Order events include order_id and item-level details (SKU/category)
  • Key person attributes mapped with consistent data types (timestamps as timestamps)
  • Deduplication strategy for orders/carts/checkouts
  • Latency measured for each event source (site vs backend vs platform sync)
  • QA cohort validated: events appear on the right profile in correct order
  • Suppression logic validated: purchasers do not receive cart recovery

Expert Implementation Tips

These are the small operational choices that make Customer.io feel “predictable” instead of “haunted.” In most retention programs, we’ve seen these tips reduce misfires and improve recovery rates without changing creative.

  • Prefer event-based truth over derived attributes for triggers. Use order_completed to exit cart journeys rather than relying only on last_order_at, which can be overwritten or delayed.
  • Include stable IDs on every commerce event. Add cart_id, checkout_id, and order_id so you can dedupe and debug quickly.
  • Keep properties “analysis-ready.” Normalize currency, store item arrays consistently, and avoid mixing strings/numbers (e.g., cart_value should always be a number).
  • Design for partial identity. Treat email capture as a first-class event (e.g., email_captured) so you can merge behavior and start flows even before purchase.
  • Build a small “data health” segment. Example: people with added_to_cart in last 7 days AND missing email. This flags tracking gaps that directly hurt recovery revenue.

Common Mistakes to Avoid

Most issues aren’t “Customer.io problems.” They’re mismatched identifiers, inconsistent schemas, or silent timestamp bugs. Fixing these usually unlocks performance fast.

  • Tracking the same customer as multiple people. Email-based profiles from one source and ID-based profiles from another leads to split histories and broken suppression.
  • Using inconsistent event names across platforms. If your site sends AddToCart but your backend expects added_to_cart, segments and triggers won’t line up.
  • Missing item-level order data. Without SKUs/categories, your cross-sell and replenishment logic becomes generic—and conversion drops.
  • Backfilling without correct timestamps. This can accidentally trigger “new buyer” or “recent purchase” journeys for old orders.
  • Not measuring latency. Teams assume events are real-time until they’re not, then wonder why recovery messages are late.

Summary

Incoming data is the retention control plane: identity + events + timestamps. If those are clean, segments are accurate and triggers are reliable.

If you’re seeing missed entries, wrong suppressions, or inconsistent audience counts, fix the data model before you touch creative.

Implement Incoming Data with Propel

If you’re tightening up your data-in layer for Customer.io, focus on the boring parts: identifiers, dedupe keys, timestamps, and ownership of attributes. That’s what keeps cart recovery, post-purchase, and winback flows from drifting over time.

When it’s helpful, we’ll map your event taxonomy to the journeys you actually run and pressure-test identity resolution so your segments stay stable as you scale. If you want that operator-level walkthrough, book a strategy call.

Contact us

Get in touch

Our friendly team is always here to chat.

Here’s what we’ll dig into:

Where your lifecycle flows are underperforming and the revenue you’re missing

How AI-driven personalisation can move the needle on retention and LTV

Quick wins your team can action this quarter

Whether Propel AI is the right fit for your brand, stage, and stack