Phone Hash Directory

SHA-1, SHA-256, MD5 hash lookup for phone numbers

Hash Validation Best Practices

Hash Validation Best Practices

Before performing hash lookup or storing hashes in your system, validate the input. Invalid hashes cause lookup failures, pipeline errors, and wasted API calls. This guide covers hash validation best practices: format checks, length validation, algorithm-specific rules, and defensive coding for developers.

Why Validate Hashes?

Hash validation prevents:

  • Lookup failures: Malformed hashes never match; validate early to fail fast.
  • Security issues: Reject potentially malicious input (e.g., injection, oversized strings).
  • Data corruption: Catch encoding errors, truncation, or copy-paste mistakes before they propagate.
  • API waste: Invalid requests consume rate limits and return errors; validate client-side when possible.

Format Validation by Algorithm

Each algorithm produces a specific format:

Algorithm Length Character Set Example
MD5 32 [0-9a-f] 5d41402abc4b2a76b9719d911017c592
SHA-1 40 [0-9a-f] 2fd4e1c67a2d28fced849ee1bb76e7391b93eb12
SHA-256 64 [0-9a-f] e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Validation rules:
- Exact length (32, 40, or 64 for MD5, SHA-1, SHA-256).
- Lowercase hexadecimal only (some systems use uppercase; normalize if needed).
- No spaces, prefixes (e.g., "md5:"), or suffixes.

Regex Patterns for Validation

MD5:

^[a-f0-9]{32}$

SHA-1:

^[a-f0-9]{40}$

SHA-256:

^[a-f0-9]{64}$

Case-insensitive (if you accept uppercase):

^[a-fA-F0-9]{32}$   # MD5
^[a-fA-F0-9]{40}$   # SHA-1
^[a-fA-F0-9]{64}$   # SHA-256

Normalize to lowercase before lookup for consistency.

Validating Phone Numbers Before Hashing

Before hashing, validate the phone number:

  • Format: E.164 or your canonical format.
  • Length: 10–15 digits including country code.
  • Valid country code: Use libphonenumber or similar to validate.
  • No invalid characters: Strip or reject non-numeric input (except +).

Invalid phone numbers produce hashes that may not match downstream systems. Validate early in the pipeline.

Algorithm Detection

When receiving a hash without explicit algorithm metadata, you can infer from length:

  • 32 chars → MD5
  • 40 chars → SHA-1
  • 64 chars → SHA-256 (or SHA-224, which also produces 64 hex chars in some representations—document your convention)

Always prefer explicit algorithm specification in API requests. Don't rely on inference when the format could be ambiguous.

Defensive Coding

  1. Trim whitespace: Hash inputs may have leading/trailing spaces; trim before validation and hashing.
  2. Reject empty: Empty strings and null should be rejected.
  3. Length limits: Reject excessively long inputs to prevent DoS.
  4. Encoding: Ensure UTF-8 for phone number strings; reject invalid byte sequences.

Validation in API Integration

When integrating with our API (see developer API guide):

  • Validate hash format before sending requests.
  • Include algorithm in the request to avoid ambiguity.
  • Handle validation errors (400) with clear user feedback.
  • Log validation failures for debugging (without logging raw PII).

Testing Validation Logic

Include unit tests for:

  • Valid hashes (all algorithms).
  • Invalid length (31, 33, 39, 41, 63, 65 chars).
  • Invalid characters (g, z, spaces, punctuation).
  • Empty and null input.
  • Case handling (uppercase hex if you accept it).

See our testing and QA with phone hashes guide for test strategies. Add property-based tests (e.g., with Hypothesis in Python or fast-check in JavaScript) that generate random valid and invalid hashes to catch edge cases your manual tests miss. Ensure validation runs before any external API call—fail fast to avoid wasting rate limits on invalid input.

Validation in Event-Driven Architectures

In event-driven or message-based systems, validation should occur at the earliest point—ideally when the event is produced. If invalid hashes propagate through queues and topics, they waste downstream processing and complicate error handling. Validate at the producer; reject invalid events before they enter the pipeline. Include validation in your schema or contract (e.g., JSON Schema, Avro) so that invalid payloads are rejected by the message broker or gateway when possible. Schema-based validation catches format errors before application code runs. For example, a JSON Schema can require that a hash field match the pattern ^[a-f0-9]{32}$ for MD5. Kafka Schema Registry, Confluent, and similar tools support schema validation at produce time. Combine schema validation with application-level validation for defense in depth. Schema validation catches structural errors (wrong type, wrong pattern). Application validation can enforce business rules (e.g., hash must exist in our index, user must have permission to lookup). Neither alone is sufficient—schema validation can't check existence, and application validation may run after invalid data has already propagated. Use both layers and ensure they run at the right points in your pipeline. Document your validation strategy in your architecture documentation. Include: what is validated (hash format, phone format, etc.), where validation runs (API gateway, application, message consumer), and what happens when validation fails (reject, log, alert). Document the order of validation layers and any dependencies between them. When adding new validation, check that it doesn't duplicate or conflict with existing layers. Review the validation strategy during architecture reviews and when onboarding new services.

Summary

Validation in Different Languages

Python: Use re.match(r'^[a-f0-9]{32}$', hash_str) for MD5. For phone validation, use phonenumbers library. JavaScript: Same regex with RegExp. Go: Use regexp.MustCompile. Java: Pattern.compile and Matcher. Ensure your regex engine supports the pattern—some have different escaping rules. Test with known valid and invalid inputs in each language.

Logging and Monitoring

Log validation failures for debugging and security monitoring. A sudden spike in invalid hash submissions may indicate a bug in an upstream system or an attempted attack. Include in logs: timestamp, source (IP or service), hash length, and error type. Do not log raw phone numbers. Aggregate metrics (e.g., validation failure rate) help detect pipeline issues early.

Summary

Validate hash format (length, character set) and algorithm before lookup. Validate phone numbers before hashing. Use regex or library checks; fail fast on invalid input. For implementation details, see our phone hash formats and developer API guide. To perform lookups with validated hashes, visit /hashes and /reverse-lookup.

Explore Phone Hash Directory