Phone Hash Directory

SHA-1, SHA-256, MD5 hash lookup for phone numbers

Phone Number Hash Formats

Phone Hash Formats and Normalization

Phone number hash formats determine whether your hashes will match across systems. The same phone number can be represented in dozens of ways—with or without country code, with spaces or dashes, with or without leading zeros. If each system hashes a different string, lookup fails. This guide covers phone format hash best practices, international format standards, and normalization rules for developers.

Why Format Matters for Hashing

Consider the US number 555-123-4567. It could be stored as:

  • 5551234567
  • 555-123-4567
  • (555) 123-4567
  • +1 555 123 4567
  • +15551234567
  • 1-555-123-4567

Each produces a different hash. If System A hashes +15551234567 and System B hashes 5551234567, their hashes will not match even though the numbers are identical. Normalization—converting all representations to a single canonical form before hashing—is essential.

E.164: The International Standard

E.164 is the ITU-T recommendation for international telephone numbering. Format: +[country code][subscriber number] with no spaces, dashes, or parentheses. Examples:

  • US: +15551234567
  • UK: +442071234567
  • Germany: +49301234567

E.164 is the recommended canonical format for phone format hash operations. It is unambiguous, globally unique, and widely supported by libraries (e.g., libphonenumber).

National Format Variations

Different countries use different conventions:

  • US/Canada: 10 digits, optional +1 prefix. Leading 1 for long-distance.
  • UK: 10–11 digits after +44; leading zero dropped when using country code.
  • Germany: Variable length; leading zero dropped with +49.
  • International format for storage: always use E.164 to avoid ambiguity.

When hashing, convert national formats to E.164 first. See our international phone formatting and country code reference for details.

Normalization Rules

Before hashing, apply these rules consistently:

  1. Strip non-digits except leading +: remove spaces, dashes, parentheses, dots.
  2. Add country code if missing: infer from context (e.g., default to +1 for US) or require explicit input.
  3. Remove leading zeros that are part of national formatting (e.g., UK 0 after country code).
  4. Use UTF-8 encoding for the final string before hashing.

Example normalization (pseudo-code):

Input:  "(555) 123-4567"
Step 1: 5551234567
Step 2: +15551234567 (assuming US)
Output: +15551234567

Hash Format by Algorithm

The hash format (output structure) depends on the algorithm:

  • MD5: 32 lowercase hex characters
  • SHA-1: 40 lowercase hex characters
  • SHA-256: 64 lowercase hex characters

Some systems use uppercase hex; for interoperability, lowercase is preferred. Ensure your lookup directory and API document which format they expect.

Handling Edge Cases

  • Short codes: Numbers like 411 or 911 may not fit E.164. Define project-specific rules (e.g., hash as-is or exclude from hashing).
  • Extensions: E.164 does not include extensions (e.g., x123). Strip or append consistently.
  • Invalid numbers: Validate before hashing; invalid numbers produce hashes that may not match downstream validation.

Integration with Hash Lookup

When integrating with our hash directory or reverse lookup, use the same normalization we use. Our systems normalize to E.164 before hashing. If your format differs, your hashes will not match. Document your normalization in API contracts and integration guides.

Libraries for Normalization

  • libphonenumber (Google): Java, JavaScript, C++, Python. Handles parsing, validation, and E.164 formatting for most countries.
  • phonenumbers (Python): Python port of libphonenumber.
  • giggsey/libphonenumber-for-php: PHP implementation.

Use these rather than custom regex to handle international edge cases correctly.

Best Practices Summary

  • Standardize on E.164 for all phone format hash operations.
  • Document your rules in API docs and integration guides.
  • Validate before hashing to catch malformed input early.
  • Use established libraries for international number handling.

Format Versioning

When you change normalization rules (e.g., adding country code inference, changing how you handle extensions), version your format. Include a format version in API responses or metadata so consumers know which rules produced a given hash. This helps when debugging mismatches—you can determine if the issue is format version A vs. B. Consider a format version field in your hash storage schema for traceability. When debugging a mismatch, the format version tells you which normalization rules were applied. This is especially valuable in long-lived systems where multiple format versions may coexist during migration. Include format version in export and API responses when relevant. When consuming data from external sources, check for format version metadata. If absent, assume the source may have used different normalization—document this assumption and add validation to detect format mismatches. Over time, format versioning reduces support burden by making debugging more systematic. New team members can quickly understand which normalization produced a given hash. When debugging a production issue, the format version in logs or database records tells you which rules were in effect at the time. This is especially valuable when you have multiple format versions in flight during a migration. Include format version in your data lineage and catalog documentation so analysts and engineers can trace the provenance of hash values.

Format Documentation in APIs

When building or consuming APIs that handle phone hashes, document the format explicitly. Include in your API spec: (1) normalization rules (E.164 or other), (2) hash algorithm (MD5, SHA-1, SHA-256), (3) hex encoding (lowercase preferred), (4) any prefixes or suffixes (e.g., some systems use "md5:" prefix). Inconsistent documentation causes integration failures. Publish a format specification and version it when you change normalization or algorithm.

Handling Multiple Formats in Legacy Systems

Legacy systems may store phone numbers in various formats—some with country code, some without; some with formatting, some stripped. When migrating to a unified hash-based system, you have two options. First: normalize at read time—when querying, normalize each legacy format to E.164 and hash. This can be slow for large datasets. Second: backfill—run a one-time migration that normalizes and hashes all existing records, then enforce E.164 for new data. Document which approach you use and any format-specific edge cases.

For more on international formats, see international phone formatting. To browse hashes and perform lookups, visit /hashes and /phones.

Explore Phone Hash Directory