Archive migrations rarely fail because of tooling. They struggle because of data quality.

Years of organic growth, PST imports, mergers and policy changes leave most legacy archives in less-than-perfect shape. Understanding common data-quality issues before migration reduces cost, delay and compliance risk.

Here are ten we see repeatedly:

1. Duplicate messages

Multiple ingestion paths create an inflated archive size.

Solution: Deduplicate before or during migration.

2. Orphaned mailboxes

Historic data belonging to users who no longer exist.

Solution: Establish ownership and mapping rules early.

3. Inconsistent retention policies

Different rules applied across time periods.

Solution: Audit and rationalise before migration.

4. Corrupt PST files

Legacy PSTs often contain broken indexing or partial data.

Solution: Validate and repair prior to ingestion.

5. Incomplete journal data

Journal archives may have gaps due to historic misconfiguration.

Solution: Conduct integrity checks before extraction.

6. Oversized attachments

Large legacy files inflate storage and slow ingestion.

Solution: Assess attachment policies and consider optimisation.

7. Inconsistent folder structures

User-driven archiving creates unpredictable hierarchies.

Solution: Normalise where appropriate.

8. Timezone inconsistencies

Older systems may not preserve timestamps consistently.

Solution: Validate metadata preservation strategy.

9. Broken indexing

Legacy archives sometimes rely on outdated indexing engines.

Solution: Reindex on ingestion into the target platform.

10. Legacy encryption

Older encryption schemes can complicate extraction.

Solution: Identify encryption handling requirements upfront.

Why does all this matter?

Ignoring data-quality issues leads to:

  • Longer migration timelines

  • Higher ingestion costs

  • Frustrated users

  • Legal defensibility concerns

Addressing them early enables smoother transitions and stronger governance in the target environment.

Migration success is rarely about speed alone. It’s about integrity, accuracy and defensibility.

The cleaner your data, the stronger your outcome.

Got a project you want to discuss? Get in touch here