Deduplication: keep Attio clean forever

Every CRM becomes a graveyard of duplicates eventually. Someone imports a list. Someone's assistant types a contact with a typo. A webhook fires twice. Three months later you have four records for the same person and nobody wants to merge them because nobody trusts which one is "right". We fix this two ways. One-shot: a scripted migration that lands 100% duplicate-free on day one. Continuous: a dedup layer that catches duplicates the moment they're created, scores them on completeness, and merges the best record automatically - with a dry-run preview and a human-in-the-loop gate for the ambiguous ones.

Direction

Attio-native + CSV ingest

Stack

Python, Attio API, Fuzzy matching, Optional FastAPI service

The what

What this integration actually does

At migration time, we ingest your existing CRM, legacy CSVs, spreadsheets - whatever you have - and produce a clean Attio workspace with no duplicate People, Companies, or Deals. Ongoing, we run a lightweight dedup job (scheduled or webhook-triggered) that catches fresh duplicates before they spread, and we expose a UI where your team can resolve the edge cases.

The how

How we build it

1
Inventory every source of contact data: existing CRM, Mailchimp, event lists, founder inboxes, Google contacts. Duplicates compound across sources.
2
Score completeness: a record with phone + email + company + last activity beats a record with just an email. Merges keep the strongest fields.
3
Fuzzy-match on name + email + company with configurable thresholds. "John Smith @ Acme" and "J. Smith @ Acme Inc" are the same person.
4
Company-name validation to prevent merging two people named John Smith at different companies.
5
Always --dry-run first. Every merge shows a diff before it touches production data.
6
Deploy the ongoing dedup job on a schedule (nightly) or wire it to the create webhook for real-time catching.

Under the hood

What lives inside the pipeline

Fuzzy matching with Levenshtein + token-set ratio on names; exact match on email domain.
Completeness score: missing fields penalized, recent activity boosts the record.
Merge-safe field policy: phone numbers unionize, notes concatenate, company name wins by most-complete.
CSV-driven batch mode for large one-off cleanups.
Optional SaaS version: dedupe-csv-saas (our own product) for teams that want a self-serve tool.

Hard-earned lessons

What we learned the hard way

Merging across companies is the #1 way to corrupt a CRM. The validation for "same person, different company" is not optional.
Attio rate limits are strict on bulk ops - always pace merges with backoff.
Never run a merge script on production without --dry-run first. Always.
Scoring heuristics matter more than matching heuristics. Getting the "which one wins" logic right is where trust is built.

Case study

Anker Capital - VC

Problem

Anker had 1,200 investor records coming in from multiple broker lists with inconsistent naming, formatting, and heavy overlap across sources.

Solution

Python dedup scripts with fuzzy matching and completeness scoring. Consolidated 1,032 of 1,200 investors automatically; the remaining 168 went to a human review queue for ambiguous merges.

Outcome

Clean foundation for every downstream automation. No "wait which record is real?" threads. The ongoing job keeps it that way.

FAQ

Questions we get

Yes. We run in dry-run first, show you the proposed merges, and execute only what you sign off on.

Yes - dedupe-csv-saas. Free for up to 250 contacts, then €0.01 per contact. Useful for teams that want to dedupe before importing to any CRM.

Same pipeline, different matcher. Company dedup uses domain as the strongest signal; contact dedup uses email.

Attio keeps an audit trail, but practically a merge is a merge. That's why we dry-run everything first.

Related integrations

Events

Gatsby → Attio

Member-network events that land in Attio the second they happen

Outbound

Lemlist → Attio

Outbound sequences that respect the CRM

Want this running on your Attio?

Book a free 30-min call. We'll map your use case to what we've already shipped and tell you whether this fits - honestly.

Book a 30-min call