What a breach check really sends

Typing your password into a website to find out if it is safe sounds like a spectacularly bad idea. For most of the internet's history it would have been. Here is the trick that makes it safe now.

Billions of real passwords have leaked. Every big breach dumps another pile of them onto the internet, and they get collected, indexed, and traded. Attackers do not start by guessing. They start by trying the passwords that have already leaked, because an enormous number of people reuse them. So "has this exact password already shown up in a breach" is one of the most useful questions you can ask about it.

The problem is obvious the second you think about it. To check whether a password has leaked, something somewhere has to compare it against the list of leaked passwords. And if checking means handing your password to a server, you have just done the one thing you were trying to avoid. You would be mailing your house key to a stranger to ask whether your lock is any good.

The clever part is that a good breach check never actually receives your password. Not the password, not even a full fingerprint of it. It learns just enough to answer the question and nothing more, using a method called k-anonymity.

How can a site check my password without seeing it?

The short version: your device does the sensitive part, and only a tiny, deliberately ambiguous clue ever leaves it.

First your browser scrambles the password into a fixed-length fingerprint, called a hash, right there on your machine. The password itself never moves. Then, instead of sending the whole fingerprint, your browser sends only the first five characters of it. The breach service looks up every leaked fingerprint that starts with those same five characters and sends back the whole list of them, often several hundred. Your browser compares that list against the rest of your fingerprint locally, on your device, and sees whether yours is in there.

The service answered your question. It told you which leaked fingerprints share your five-character prefix. But it never learned which one was yours, or whether any of them was yours at all, because you did the final comparison yourself and never told it the result.

The k-anonymity range query, step by step

The password is hashed locally with SHA-1, producing a 40-character hex string. The client splits it: the first 5 hex characters are the prefix, the remaining 35 are the suffix.

Only the 5-character prefix is sent, as a range query: GET /range/ABCDE. The server returns every suffix in its breach corpus whose hash shares that prefix, each with a count of how many times it has been seen in breaches. The client searches that returned list for its own suffix. A match means the password is in the corpus, and the count tells you how badly it is burned.

The password never leaves the device. The full hash never leaves the device. Only 5 hex characters do, and they map to a crowd of possible passwords (see the next box), so the server cannot identify yours.

Why five characters is enough to hide in

The whole thing rests on those five characters being too vague to pin anything down. And they are, by a wide margin. A five-character prefix is shared by a huge number of different fingerprints, which means it is shared by a huge number of different passwords. When your browser sends that prefix, the service sees a clue that points at a crowd, not at you.

It is the difference between telling someone your phone number and telling them your area code. The area code narrows you to a region of millions. It identifies nobody.

The math of the prefix crowd

A SHA-1 hash is 160 bits, written as 40 hex characters. The 5-character prefix is 20 bits, so there are 16⁵ = 1,048,576 possible prefixes. The breach corpus holds well over half a billion unique hashes. Spread across roughly a million prefixes, any given prefix returns several hundred suffixes on average.

So when the server sees your prefix, it knows your hash is one of a few hundred it just returned, drawn from the unimaginably larger space of all possible passwords that could hash into that prefix. It has no way to tell which suffix you were checking, or whether you matched any of them, because the matching happens on your device and the result is never reported back. The privacy does not depend on trusting the server. It is a property of how little the server is given.

Wait, isn't SHA-1 broken?

If you know a little security, this is the moment you get twitchy. SHA-1 has been considered broken for years, and using it usually marks a tool as out of date or careless. Here it is fine, for a reason that is easy to miss.

SHA-1 is broken for collision resistance, which matters when you are using a hash to prove a file has not been tampered with. That is not what is happening here. In a breach check the hash is not protecting anything. It is just a shared, agreed-upon way to turn passwords into comparable fingerprints so a prefix can be matched. The protection comes entirely from the k-anonymity, from sending only a fragment that points at a crowd. SHA-1 is simply the format the breach databases were built in, so it is the format the lookup uses. Swapping it for a "stronger" hash would not make the check more private and would just break compatibility with the data.

Should I check a password my generator just made?

No. A freshly generated random password has never existed before. It is not in any breach corpus, it never will be by chance, and checking it just confirms the obvious. The check would come back clean every time, which tells you nothing you did not already know. What that uselessness reveals is what breach checking is actually for.

Breach checking earns its keep on passwords a human chose or has been using. Those are the ones that might already be sitting in a dump somewhere, especially if they have been reused across sites or were never very original to begin with. If you are checking a password, it should be one with a history, not one that was random noise three seconds ago.

This is also why a good tool keeps generation and checking as separate jobs. The generator hands you something new and proves its strength with entropy. The checker takes something old and tells you whether it has already been compromised. The same privacy-first thinking runs through how the whole tool is built.

Generate a strong password

Common questions

Is it safe to type my password into a breach checker?

It is safe if the checker uses k-anonymity, which means your password and its full fingerprint never leave your device. Only a short, deliberately ambiguous fragment of a hash is sent, and it points at hundreds of possible passwords, so the service cannot tell which one you checked. Avoid any checker that sends the whole password or full hash.

What is k-anonymity in a password check?

It is a method where your browser hashes the password locally, sends only the first five characters of that hash, and receives back every leaked hash sharing those five characters. Your device does the final comparison itself, so the service learns nothing about which password you were checking.

Does the breach service see my password?

No. It never receives the password or the full hash. It only sees a five-character prefix that is common to a large crowd of different passwords, and it never learns the result of the comparison, which happens on your device.

Why does a breach check use SHA-1 if SHA-1 is broken?

SHA-1 is weak for collision resistance, which is not relevant here, because the hash is not protecting anything. It is only a shared format for turning passwords into comparable fingerprints. The privacy comes from sending a tiny fragment, not from the strength of the hash, and the breach databases were built using SHA-1.

Should I run a breach check on a newly generated password?

No. A freshly generated random password has never existed before and will not be in any breach database, so the check tells you nothing. Breach checking is for passwords you chose yourself or have been using, which are the ones that might already be compromised.