For example: That a password’s entropy is the sum of its parts is a big assumption. By disregarding the “configuration entropy” — the entropy from the number and arrangement of the pieces — zxcvbn is purposely underestimating, by giving a password’s structure away for free: It assumes attackers already know the structure (for example, # matches: the password's full array of candidate matches.

# each match has a start index (match.i) and an end index (match.j) into # the password, inclusive.

Without checking for common patterns, the practice of encouraging numbers and symbols means encouraging passwords that might only be slightly harder for a computer to crack, and yet frustratingly harder for a human to remember.

Especially because this is running browser-side as the user types, efficiency does matter.

To get something up and running I started with the simpler approach of calculating the sum for every possible non-overlapping subset, and it slowed down quickly.

Burnett ran a more recent study last year, looking at 6 million passwords, and found an insane 99.8% occur in the top 10,000 list, with 91% in the top 1,000.

The methodology and bias is an important qualifier — for example, since these passwords mostly come from cracked hashes, the list is biased towards crackable passwords to begin with. For the rest, I’d wager a large percentage are still predictable enough to be susceptible to a modest online attack.

backpointers = [] # for the optimal sequence of matches up to k, # holds the final match (match.j == k).

# null means the sequence ends w/ a brute-force char for k in [0...password.length] # starting scenario to try to beat: # adding a brute-force character to the minimum entropy sequence at k-1.

make_bruteforce_match = (i, j) - pattern: 'bruteforce' i: i j: j token: password[i..j] entropy: lg Math.pow(bruteforce_cardinality, j - i 1) cardinality: bruteforce_cardinality k = 0 match_sequence_copy = [] for match in match_sequence # fill gaps in the middle [i, j] = [match.i, match.j] if i - k , or null if the sequence doesn’t include such a match.

Typical of dynamic programming, constructing the optimal sequence requires starting at the end and working backwards.

, a fat 680k (320k gzipped), most of which is a dictionary.

Tags: , ,