Summarize this documentation using AI
Overview
If your retention program runs on noisy event streams, you’re one bad payload away from broken segments and misfiring campaigns. Filtering incoming data is how you keep Customer.io fed with only the events and attributes you actually trust—so cart recovery, replenishment, and winback triggers fire when they should. If you want a second set of eyes on your tracking plan and filters, book a strategy call and we’ll pressure-test it like an operator would.
In most D2C stacks, the same “Add to Cart” can show up three different ways (web pixel, server event, and Shopify app). Without filters, Customer.io happily ingests all of it—and your abandonment flow sends twice, your holdouts get polluted, and attribution becomes a mess.
How It Works
Customer.io ingests data from your sources (Track API, SDKs, pipelines/integrations) as people updates and events. Incoming filters sit in that ingestion path and decide what gets accepted, rejected, or shaped before it becomes usable in segmentation and campaign triggers.
- Events: Filters can block unwanted events (ex: test traffic, internal QA, duplicate client-side events) so they never enter your event stream.
- Attributes: Filters can prevent bad profile updates (ex: overwriting
emailwith null, writing “unknown” intophone, stamping malformed timestamps) that would otherwise break targeting and channel eligibility. - Identity resolution implications: If your incoming data sometimes arrives anonymous and later identifies (common with carts), filtering affects whether you’ll have clean anonymous activity to merge later—or a pile of junk you can’t reliably stitch to a customer.
- Trigger reliability: Campaigns that start on “event performed” are only as good as the event stream. Filtering is basically insurance against accidental sends and false positives.
Real D2C scenario: You run a cart abandonment journey triggered by cart_updated. Your frontend fires it on every quantity change, and your backend also fires it when inventory is reserved. Without filtering/deduping rules, a single shopper can generate 8+ events in a minute, enter the journey multiple times, and get hammered with reminders. With incoming filters, you keep only the server-side “final” cart update (or only events with a stable cart_id and meaningful delta).
Step-by-Step Setup
Before you touch Customer.io, get clear on what “good” looks like for each retention-critical event. You’re not filtering for elegance—you’re filtering so your segments and triggers match reality.
- List your retention-critical events and required fields.
- Cart:
cart_created/cart_updatedwithcart_id,items,value,currency, and ideallysource(client/server). - Checkout:
checkout_startedwithcheckout_id. - Order:
order_completedwithorder_id(must be unique),total,line_items.
- Cart:
- Identify the noise patterns you want to block.
- Internal/admin traffic (your team testing flows).
- Staging/dev environments accidentally pointing to prod.
- Duplicate sources (pixel + server) for the same action.
- Malformed payloads (missing IDs, empty arrays, null emails).
- Create incoming filters aligned to those patterns.
- Block events where
emailends with your company domain (common for QA) or whereis_testflag is true. - Block client-side purchase events if you’ve standardized on server-side for orders.
- Reject events missing required identifiers (ex: drop
order_completedwithoutorder_id).
- Block events where
- Protect key identity and channel fields.
- Prevent overwriting
emailorphonewith null/blank values. - Guard subscription/consent attributes so a bad sync doesn’t opt everyone out (or in).
- Prevent overwriting
- Validate against Activity Logs and live campaign entry.
- Send a controlled set of events from each source and confirm what’s accepted.
- Confirm segments update as expected (especially “has done X in last Y”).
- Confirm a trigger fires once per intended action (not per event spam).
- Roll out gradually.
- Start with the highest-risk filters (test traffic, staging, null overwrites).
- Then tighten rules for dedupe/required fields once you’ve observed edge cases.
When Should You Use This Feature
Filtering matters most when you’re past “sending emails” and into “running systems.” If you’re using Customer.io to drive revenue, you need your inputs to be predictable—otherwise every new integration becomes a new failure mode.
- Cart recovery: When cart events are high-volume and noisy (quantity changes, multi-tab shoppers, pixel + server duplication).
- Repeat purchase and replenishment: When order events arrive from multiple systems (Shopify + subscription tool + ERP) and you need one canonical purchase signal.
- Reactivation: When “inactive” segments get corrupted by low-quality engagement events (ex: page views from bots, customer support tools firing “logged in” events).
- Channel eligibility: When SMS/email consent and deliverability fields are updated by multiple sources and accidental overwrites cause revenue drops.
Operational Considerations
This is where most teams get surprised: filtering isn’t just a data hygiene task—it directly changes who qualifies for segments and which journeys trigger. Treat it like production infrastructure for revenue automations.
- Segmentation accuracy: If you filter out a noisy event, any segment depending on “performed event” will shrink. That’s usually good, but it can also hide real buyers if your rules are too strict.
- Event canonicalization: Pick one source of truth per event type (server-side for purchases is the usual call). If you keep both, you’ll need a consistent dedupe key (
order_id,cart_id) or your LTV and frequency logic will drift. - Identity resolution and anonymous activity: If you filter anonymous events aggressively, you might reduce your ability to merge pre-email cart behavior into identified profiles later. In practice, keep anonymous cart events if they include a stable
anonymous_idandcart_id, and filter the rest. - Orchestration realities: Your ESP automations don’t care why an event is wrong—they’ll still send. Filters are the cheapest place to prevent downstream damage compared to patching every journey with defensive conditions.
- Monitoring: Add a lightweight review cadence (weekly at first) to scan Activity Logs for rejected/accepted patterns after new releases, theme changes, or app installs.
Implementation Checklist
If you want this to hold up after the next site redesign or app install, you need a short checklist that keeps everyone honest—engineering, retention, and analytics.
- Define a canonical event list for retention (cart, checkout, purchase, subscription, refund/cancel).
- Document required properties per event (IDs, value, currency, source, timestamp format).
- Decide source-of-truth per event type (client vs server vs integration).
- Add filters for test/internal/staging traffic.
- Add guards to prevent null/blank overwrites on identity + consent fields.
- Implement a dedupe strategy (unique IDs + rules for duplicates).
- Verify segment counts before/after filters (expect changes; explain them).
- Run a live-fire test: trigger cart abandon + purchase and confirm no double entry.
- Set an owner and a change log for filter edits.
Expert Implementation Tips
Once you’ve been burned by a double-send or a broken consent sync, you start designing filters like guardrails—not nice-to-haves.
- Tag every event with a source. Add
source=client,source=server,source=shopify. Then filtering becomes simple and auditable. - Prefer “allow lists” for high-stakes triggers. For purchase and subscription events, it’s safer to only accept events that meet strict criteria than to try to block every bad variant.
- Protect revenue journeys from event storms. Even with filters, add a journey-level frequency guard (e.g., “don’t re-enter within 4 hours”) for cart flows. Filters reduce noise; they don’t eliminate edge cases like multi-device shoppers.
- Keep anonymous cart events if you can merge later. For many D2C brands, the money is in recovering carts that happen before email capture. Don’t filter those out just because they’re anonymous—filter them based on quality (stable IDs, meaningful payload).
- Make malformed timestamps a hard fail. Time-based segments (“in the last 7 days”) break quietly when timestamps are wrong. Reject bad formats early.
Common Mistakes to Avoid
Most issues aren’t caused by Customer.io—they’re caused by teams changing tracking upstream and forgetting that Customer.io is only as reliable as what you feed it.
- Filtering without measuring segment impact. You ship a filter, cart abandon segment drops 40%, and nobody knows if it’s fixed noise or lost signal.
- Over-filtering anonymous activity. You “clean up” data and accidentally remove the very behavior that powers pre-identification recovery.
- Letting multiple sources write identity fields. One integration sets
emailto blank or overwritesphoneformatting, and SMS eligibility tanks. - No dedupe key for purchases. If
order_completedcan arrive twice, your repeat purchase logic and VIP segmentation will drift fast. - Relying on journey conditions instead of fixing ingestion. Patching every campaign is slower and more fragile than filtering the bad inputs once.
Summary
If your segments feel “off” or journeys misfire, start by fixing what enters Customer.io. Incoming filters keep your event stream trustworthy, your identity data stable, and your retention triggers predictable.
Use filters when you have multiple sources, noisy cart activity, or any risk of consent/identity overwrites—because that’s where revenue automations tend to break first.
Implement Filter Incoming Data with Propel
If you’re tightening filters because cart recovery is double-sending or your winback segment is polluted, treat it like a tracking system redesign—not a quick toggle. We’ll typically map your event sources, define a canonical schema, and set filters that protect segmentation and trigger reliability inside Customer.io. If you want help pressure-testing the plan (and avoiding the “we filtered out real buyers” mistake), book a strategy call.
The goal isn’t perfect data. It’s data you can confidently automate against—so your retention program scales without surprise sends or silent segment drift.