I have been sucked into the glory of Zynga games.

Zynga is a gaming company that focuses on simple social media games. My favorites include the hits:

**Words with Friends
Scramble with Friends
**and

Hanging with Friends

Hanging with Friends

Words with Friends? Scrabble.

Scramble with friends? Boggle.

Hanging with Friends? Hangman.

Let’s talk about Hanging with Friends (HwF) for a minute. It’s a simple simple game. You have a certain number of guesses to guess a word. Zynga does strategy right by reducing the number of guesses for the longer words (because there are obviously more letters per word that are correct). For each word you guess incorrectly, you lose a balloon. Once you lose five balloons, you lose.

When it’s your turn to make a word, you’re given 12 random letters from which you can craft the word your opponent must guess. But… are they random letters? If I knew the letter distribution, I’d have a better ability to guess my opponents words.

So I recorded 12 letters per game for 100 games. 1,200 letters.

The first obvious conclusion: There are always four vowels.

So how were the four hundred vowels split in my sampling?

A: 65

E: 134

I: 81

O: 71

U: 49

Not an even distribution but this is certainly not surprising. In fact, my original hypothesis was that the letter distribution of HwF would match the letter distribution in Words with Friends (WwF).

Words with Friend’s vowel distribution (per 108 letters):

A: 9

E: 13

I: 8

O: 8

U: 4

Conveniently, the math works out so that there are a total of 40 vowels in a game of words with friends, and there are 400 in my HwF sampling. How close are these numbers when normalized? Pretty close.

Format:

Letter: My sampling vs. WwF Letter distribution

A: 65 vs 90

E: 134 vs 130

I: 81 vs 80

O: 71 vs 80

U: 49 vs 40

Does the same hold true for the consonants? (WwF numbers are normalized)

B: 23 vs 27

C: 29 vs 27

D: 61 vs 67

G: 57 vs 40

H: 53 vs 53

J: 10 vs 13

K: 9 vs 13

L: 69 vs 53

M: 29 vs 27

N: 61 vs 67

P: 21 vs 27

Q: 11 vs 13

R: 57 vs 80

S: 77 vs 67

T: 92 vs 93

V: 33 vs 27

W: 23 vs 27

X: 11 vs 13

Y: 30 vs 27

Z: 15 vs 13

Fascinating. It’s pretty close (with exceptions like G, R, and S). I’ll leave it up to the statisticians amongst you to calculate the significance and confidence interval of my sample per the target.

What does this tell us? Well, if I’m correct in my assumption you’d never see more instances of a letter than could appear in a scramble game. You’d never see two Z’s or two Q’s show up in your letter selection.

For my sampling, this is accurate

This chart shows the number of times a letter has appeared multiple times in a single game. For example there are 2 instances in my sampling where three S’s showed up in the word building box. Never have I seen two K’s or two J’s appear.

Below each letter, are the total number of that letter in the WwF game. It matches fairly well, though in my HwF games I’ve never seen more than 3 of any one letter. This might be a limitation, or just chance. Keep an eye out in your games. If you ever find an instance with multiple Z’s or J’s let me know!! I’d love to hear it. OR, if you ever get four of the same letter (including vowels) – I’d be interested.

This has been a fun experiment.

Oh Mike, this made me so happy to read.

Fun fact: Last night in one game I got two B’s and in another I got two C’s

The accuracy of my model is improving.

Have you compared the letter distributions to the distributions in normal English language? I’d be interested to see if they just programmed it in that way. Cheap and easy.

http://en.wikipedia.org/wiki/Letter_frequency#Relative_frequencies_of_letters_in_the_English_language