I often see a misunderstanding, from people with a wide range of stats experience, on what makes a rare event impressive.

It will often go something like this:

A sequence of rare things has happened!

Our sequence has 3 steps:

  1. Could be A at 99% or B at 1%
  2. Could be X at 90% or Y at 10%
  3. Could be Q at 60%, W at 25%, U at 15%

The normal outcome happened, A.

Then the rare outcome happened, Y.

But then somewhat unlikely thing happened, W.

If p(A) means the probability of A occuring.

Then p(A) * p(Y) * p(W) is the odds of them all occuring.

.99 * .1 * .25 = .02475 = 2.475%

And hey look at that pretty rare! This sequence of outcomes only happens approximately 1 out of every 40 times!

But this doesn't represent what's interesting about rare events.

If you cared about the specific sequence of outcomes it would make sense, but often it's an incorrect assumption.

We simply care about seeing any improbable event — How rare is this result compared to all other possible results.

Permutations

Let's reconsider the original example:

A or B then X or Y then Q, W, or U

We got the sequence, or permutation, of A -> Y -> W.

It could result in the following permutations:

PermutationsOdds of Occurence
A -> X -> Q53.46%
A -> X -> W22.275%
A -> X -> U13.365%
A -> Y -> Q5.94%
A -> Y -> W2.475%
A -> Y -> U1.485%
B -> X -> Q0.54%
B -> X -> W0.225%
B -> X -> U0.135%
B -> Y -> Q0.06%
B -> Y -> W0.025%
B -> Y -> U0.015%

And of those permutations these are just as or less likely:

PermutationsOdds of Occurence
A -> Y -> W2.475%
A -> Y -> U1.485%
B -> X -> Q0.54%
B -> X -> W0.225%
B -> X -> U0.135%
B -> Y -> Q0.06%
B -> Y -> W0.025%
B -> Y -> U0.015%

So of the total 12 outcomes, 8 outcomes are just as or rarer then our permutation.

We can sum the odds of occurence to see how likely it would be to get any event that would be just as rare.

p(AYW) + p(AYU) + ... + p(BYU) = 4.87%

About double the original odds of getting that event.

Not as special.

This summation is what I mean by “rarity”.

The odds of getting a permutation at least as unlikely out of all possible permutations.

Given this definition we can calculate the rarity of all of our permutations:

PermutationsOdds of OccurenceRarity
A -> X -> Q53.46%100%
A -> X -> W22.27546.54%
A -> X -> U13.365%24.175%
A -> Y -> Q5.94%10.81%
A -> Y -> W2.475%4.87%
A -> Y -> U1.485%2.395%
B -> X -> Q0.54%0.91%
B -> X -> W0.225%0.37%
B -> X -> U0.135%0.235%
B -> Y -> Q0.06%0.1%
B -> Y -> W0.025%0.04%
B -> Y -> U0.015%0.015%

Simple enough.

Combinations and Paths

Instead of a series of dinstinct events with it's own outcomes, sometimes we just have a series of repeated events.

Say we have an event with three possible outcomes:

OutcomeOdds of Occurence
A50%
B35%
C15%

That we repeat 3 times:

Permutations
A, A, A
A, A, B
A, B, A
C, C, C

We get a total of 3^3 = 27 permutations.

But many of these permutations will have the same odds, given they are the same combination of event outcomes.

p(ABA) = p(AAB) = p(BAA)

So to simplify it let's just list the possible combinations:

CombinationsOdds of Occurence
A, A, A12.5%
A, A, B8.75%
A, A, C3.75%
A, B, B6.125%
A, B, C2.625%
A, C, C1.125%
B, B, B4.2875%
B, B, C1.8375%
B, C, C0.7875%
C, C, C0.3375%

We do have to update our odds however since they refer to the odds of a single permutation — not any permutation that falls under that combination.

Instead we have to multiply the odds by the number of permutations that result as those combinations. I refer to this concept as he number of paths that lead to a combination.

If we counted how many permutations lead to these combinations manually we'd get the following:

CombinationsPaths
A, A, A1
A, A, B3
A, A, C3
A, B, B3
A, B, C6
A, C, C3
B, B, B1
B, B, C3
B, C, C3
C, C, C1

Luckily instead of counting we can just derive how many permutations result from each combination. Take the factorial of the length of our sequence and divide it by the factorial of each outcome's occurences.

For example, using the sequence A, A, B:

n = 3
A = 2
B = 1

n!/A!B!

n! = 3 * 2 * 1 = 6
A! = 2
B! = 1

6/(2*1)=3

Or for A, B, C:

n = 3
A = 1
B = 1
C = 1

n!/A!B!C!

n! = 3 * 2 * 1 = 6
A! = 1
B! = 1
C! = 1

6/(1*1*1)=6

Intuitivly you can tell that the more occurences there are of particular outcomes, the fewer possible permutations there are. And in our equation the larger the occurence counts get the larger our denominator gets. Resulting in fewer permutations.

Now to calculate the actual odds of the combinations.

CombinationsOdds * PathsResulting Odds
A, A, A12.5% * 112.5%
A, A, B8.75% * 326.25%
A, A, C3.75% * 311.25%
A, B, B6.125% * 318.375%
A, B, C2.625% * 615.75%
A, C, C1.125% * 33.375%
B, B, B4.2875 * 14.2875
B, B, C1.8375% * 35.5125%
B, C, C0.7875% * 32.3625%
C, C, C0.3375% * 10.3375%
Total27 Paths100%

and for our rarities:

CombinationsOddsRarity
A, A, B26.25%100%
A, B, B18.375%73.75%
A, B, C15.75%55.375%
A, A, A12.5%39.625%
A, A, C11.25%27.125%
B, B, C5.5125%15.875%
B, B, B4.287510.3625%
A, C, C3.375%6.075%
B, C, C2.3625%2.7%
C, C, C0.3375%0.3375%

about what you'd expect.

Coin Flips

If instead of all these options we're talking about pure flips (two outcomes each with 50% odds of occuring) we can take advantage of some fun properties.

Binomial Coefficients!

function binomialSum(n, k) {
  if (k > Math.floor(n / 2)) {
    k = Math.abs(k - n);
  }
  let b = 1;
  let sum = 1;
  for (let i = 1; i <= k; i++) {
    b *= (n - i + 1) / i;
    sum += b;
  }
  return Math.min(1, (sum / 2 ** n) * 2);
}

Dice

This can presumably be extended to dice (n outcomes with 1 in n odds of occuring), but I can't be bothered right now.

Complexity

It also seems like you should be able to extrapolate from our Binomial Coefficients algorithm to our permutations or at very least our repeating events examples, but I've run into issues.

The naive way would be to generate all possible combinations, calculate the odds for each combination, and use n!/A!B!...Z! to determine how many permutations result in each combination.

The big O isn't terribly happy with this scaling.

Specifically for our permutations you get into some complications with having differnt amount of outcomes for each possible event in the sequence. Too complicated don't care.

This is probably all solved already if I knew the secret club knock (math notation).

Comparing Rarities

One thing that doesn't quite feel intuitive is that while certain permutations or combinations can have very similar odds of occuring they have very large differences in rarity (or vice-versa).

Let's look at our example from the “Combinations” section:

CombinationsOddsRarity
A, A, B26.25%100%
A, B, B18.375%73.75%
A, B, C15.75%55.375%
A, A, A12.5%39.625%
A, A, C11.25%27.125%
B, B, C5.5125%15.875%
B, B, B4.287510.3625%
A, C, C3.375%6.075%
B, C, C2.3625%2.7%
C, C, C0.3375%0.3375%

While A, A, A has 12.5% odds and A, A, C has 11.25% odds they have a 12.5% discrepency in rarity as a result. While literally correct to how rarity was defined, it feels like there could be a better way to represent that A, A, A and A, A, C are actually more similar in rarity then we think.

The simple solution I've used is simply taking the log() value of our rarities to better consider the magnitudanal differences between each.

CombinationsOddsRaritylog(Rarity)
A, A, B26.25%100%0.00
A, B, B18.375%73.75%0.30
A, B, C15.75%55.375%0.59
A, A, A12.5%39.625%0.93
A, A, C11.25%27.125%1.30
B, B, C5.5125%15.875%1.84
B, B, B4.287510.3625%2.27
A, C, C3.375%6.075%2.80
B, C, C2.3625%2.7%3.61
C, C, C0.3375%0.3375%5.69

While not perfect, it seems more accurate to consider C, C, C two magnitudes rarer then B, C, C and A, A, C only half a magnitude more rare then A, A, A.