Posts Tagged ‘Google’

How to Win (Or Maybe Not) on Wheel of Fortune

Wednesday, August 19th, 2009

Wheel! Of! Fortune!One nice thing about being a programmer is that you can automate certain calculations that you’d have to be crazy to attempt any other way. While some would see a non-programmer attempting to figure out some of this stuff as borderline insane, we coders just come across as eccentric with a lot of time on our hands. If people ask a question fairly frequently, and said question involves lots of number-crunching, you can bet some coder somewhere has taken a crack at trying to crunch those numbers.

Case in point: Wheel of Fortune. Now, I’ve never really been a huge fan of the show, but I see it a lot anyway. It’s on after Jeopardy!, which I really do enjoy and try to watch fairly frequently, so I’ve seen my fair share of episodes of Wheel. One thing that always got me was the final puzzle. For those of you who don’t know, this is how it works: the winning contestant from all the previous rounds must solve a shorter, harder puzzle by himself in a small span of time. He is given a (usually unhelpful) hint in the form a category, and some of the most common letters in the English language (R, S, T, L, N, and E) are already shown. Then the contestant must choose three more consonants and one more vowel. If any of these letters occur in the puzzle Vanna White shows them, and the players has ten seconds to guess what the word or phrase is.

There are other factors at play here, but they don’t relate to what I’m interested in most, namely: What are the best letters to pick? Can we do an analysis of the letter frequency of a whole bunch of these puzzles? Can we determine whether or not the producers of the show pay attention to these frequencies? Thanks to the Internet and some spare time in the hands of a programmer, the answer is a (qualified) yes, we can. Please note that while I do enjoy math, I am most certainly not a mathematician, so this is just an armchair analysis, and not a scholar’s take.

First, I needed a set of data. As interested as I was in determining the letter frequencies, I wasn’t about to spend six months collecting data by actually watching the end of each show. I have the Internet to do that sort of stuff for me! In this case, I found this forum, whose residents had already done the hard work. I was able to grab the final puzzles from a couple of threads on this site, and store them in some text files, one puzzle to a line. Then, I wrote a short Python script to parse through the results and generate links to a Google Charts representation of the data. If you’re going to screw around on the Internet, why waste time inputting data into Excel?

The Code

Below is the code. Please note that while I have commented it, it’s task-oriented code. I did not sit down and think things through for hours on end; I was more interested in the results produced by the code than the process of making it. To that end it may be a bit rough around the edges. If you’re a Python programmer you might even think it un-Pythonic.

Show/Hide Source Code

from operator import itemgetter # for sorting
import sys # for command-line arguments
 
# makes sorting dictionaries prettier
def sortDictionary (s):
    return sorted(s.items(), key = itemgetter(1), reverse = True)
 
hexColors = ["F05DCF", "F4B213", "7BB5FE", "19B915",
       "C913E4", "E38080", "4891EB", "DCF725", "E02EB0",
       "EE7D18", "16D949", "73E0C9", "22F1DB", "1460A1",
       "CF8040", "FFFFFF", "CF8054", "204E00", "2B1160",
       "87513C", "DECEE9", "C913E4", "83B892", "597D4C",
       "DACA5D", "2F486B", "D79E17", "826889", "359DA1",
       "DE7A43", "568C51", "FBF786"]
 
 
if __name__ == "__main__":
    # set up command line arguments
    # thumb:   creates a smaller file, with shorter (or no, depending on letter count) labels
    # verbose: prints out each list of letters and frequencies, too
 
    if (len(sys.argv) < 2 or len(sys.argv) > 4):
        print "Usage: wof.py [filename] [t|f] [v]"
        exit()
 
    thumb = False
    verbose = False
    fileName = sys.argv[1]
    if len(sys.argv) > 2 and sys.argv[2].lower() == "t":
        thumb = True
    if len(sys.argv) > 3 and sys.argv[3].lower() == "v":
        verbose = True
 
 
    # set up lists of letters
    letters = ["a", "b", "c", "d", "e", "f", "g", "h",
               "i", "j", "k", "l", "m", "n", "o", "p",
               "q", "r", "s", "t", "u", "v", "w", "x",
               "y", "z"]
    consonants = ["b", "c", "d", "f", "g", "h",
                "j", "k", "l", "m", "n", "p",
               "q", "r", "s", "t", "v", "w", "x",
               "y", "z"]
    vowels = ["a", "e", "i", "o", "u"]
 
    # letters to exclude (already given to you on game show)
    already = ["r", "s", "t", "l", "n", "e"]
 
 
    # set up frequency dictionaries {leter : number of occurences}
    allFrequencies = dict((letter, 0) for letter in letters)
    vowelFrequencies = dict((letter, 0) for letter in vowels)
    consonantFrequencies = dict((letter, 0) for letter in consonants);
 
    # Read the data file. Should consist of one final puzzle
    # solution per line, optionally lines can start with "#" for a comment
    file = open(fileName)
    while True:
        line = file.readline()
        if not line: break #end of loop
        if line[0] == "#": continue # skip comments
        for letter in line:
            lower = letter.lower()
            if lower in allFrequencies:
                allFrequencies[lower] = allFrequencies[lower] + 1
            if lower in already: # exclude RSTLNE from vowels and consonants
                break
            if lower in vowelFrequencies:
                vowelFrequencies[lower] = vowelFrequencies[lower] + 1
            if lower in consonantFrequencies:
                consonantFrequencies[lower] = consonantFrequencies[lower] + 1
 
    #sort dictionaries
    allFrequencies = sortDictionary(allFrequencies);
    vowelFrequencies = sortDictionary(vowelFrequencies);
    consonantFrequencies = sortDictionary(consonantFrequencies);
 
    if verbose:
        #display the lists
        print "ALL:\n", allFrequencies
        print "\nVOWELS:\n", vowelFrequencies
        print "\nCONSONANTS:\n", consonantFrequencies
 
 
    charts = {"All+Letters" : allFrequencies, "Vowels" : vowelFrequencies,
             "Consonants" : consonantFrequencies}
 
    for chart in charts:
        # make the image URLs, using Google Charts
        if thumb:
            url = "http://chart.apis.google.com/chart?chs=100x100&cht=p"
        else:
            url = "http://chart.apis.google.com/chart?chs=400x300&cht=p"
 
        # build lists for data series and its labels
        labels = []
        data = []
        for entry in charts[chart]:
            if int(entry[1]) > 0: # exclude any letters not used
                # make sure a thumbnail doesn't have too many labels to clutter it
                if thumb and len(charts[chart]) <= 6:
                    labels.append(entry[0].upper())
                else:
                    labels.append(entry[0].upper() + "+(" + str(entry[1]) + ")")
                data.append(str(entry[1]))
 
        # set them to the query string parts for data and labels
        dataRange = "&chd=t:" + ",".join(data);
        if (thumb and len(charts[chart]) >= 6):
            labelRange = ""
        else:
            labelRange = "&chl=" + "|".join(labels);
 
 
        # build the array of chart colors
        chartColors = "&chco=" + ",".join(hexColors[0:len(charts[chart])-2])
 
        # build final URL
        url = url + dataRange + labelRange + "&" + chartColors + "&chtt=" + chart;
        print "\n", chart, "\n", url

The Results

Might as well show off the pretty, pretty pictures, huh? Click any graph below to enlarge it.

At first look, the data is not too promising. I can give you two letters that will increase your chances of getting a ‘hit’, and one of them might come in handy. In our six months’ worth of data, O is the favorite… but not by much. Looking at both periods, it seems pretty clear that somebody at Merv Griffin Productions is responsible for distributing the vowels O, I, and A across the spectrum so none of them shows up too frequently. Notice how I and O are tied on the most recent set of data, but A is the second-most frequent vowel on the older figures. Combining all the numbers, we see that these three vowels are essentially tied in frequency, with U in a slightly lower class. But at least it’s something to work with, right? From the last six months of Wheel, it looks like ‘O’ is the best vowel to go with.

Now what about consonants? I was most excited when I pulled up the 2009 consonants graph (the first one I did), because you can clearly see that the top two letters are definitely a bit more common than the rest, and even the top three look pretty solid. H, G, and D… could those be the winning ones? My excitement faded, however, as I ran the earlier set of data through the script. Looking at the combined chart, H still has a statistically significant lead. But you’ll have no luck trying to discover the three letters to choose. But we can limit our options a little. F, G, and B are all clearly separated from the next letter (D) in the combined graph, with a decent-sized gap between them. It’s harder to say for certain, but it looks like the producers may be balancing these top four letters throughout their puzzles.

So, what to go with? You should definitely choose O for your vowel. H is the consonants which statistically is most likely to occur. Then any of F, G, and B would probably do you some good.

And how about those shifty producers? Are they gaming the final answers, so they don’t have to give out as much prize money? Are they maybe picking and choosing their phrases to deflect somebody who did a little research before heading down to the studio to play? Well, let’s try to find a pattern in the frequency of letters in the English language (please note that I’ve removed RSTLNE from these graphs):

Right away you should notice some major discrepancies between the Wheel of Fortune data set and written English. While H shows up at the top where we’d expect, D is clearly in a much higher class than B, F, and G that we picked above. In fact, F and G aren’t even in the top five, and other letters that show up often in English aren’t placed very high in the combined final puzzle data. This is probably the result of producers fine-tuning their answers over the years, either to avoid the letters contestants chose most often or to more evenly distribute the winning ones.

This is such a small data set, however, that we shouldn’t rely on it too heavily. After all, Wheel of Fortune has been on the air for twenty-six years, and we only have half a year’s worth of data, or around 2% of all that is available. But a small attempt at analyzing this data is probably better than going in blind and picking letters that you ‘think’ show up frequently.

There are other ways that this quick-and-dirty analysis can be improved. Mine is a pretty naive approach. Going over some basic rules of English might help to improve the method. For example, breaking the final puzzles down into phonemes could yield more information, as might looking at letter pairs instead of single letters. For instance, Q never occurs without U, and some letters are more common after others. This is especially useful in our task, as we need to choose three consonants but only one vowel. Consonants are most often followed by vowels, so consonant pairs increase the uniqueness of a phrase. P is often followed by R, L, or H, for example. Looking for patterns in the words themselves might also yield better predictions about what letters would be better to guess.

Another thing to realize is that these numbers are averages from a discrete set. Some puzzles might include the high-frequency letters and be solvable with only those (and RSTLNE), while some might not include a single one of the high-frequency picks. Picking from one of the high scorers might improve your odds of getting more letters, but it doesn’t guarantee that you’ll get some, or even any. You might wind up with something like ‘Blind Luck’, which doesn’t contain an O or an H. These estimates can help you, but only so much.

So, after all these calculations, I now know what I would do if I ever found myself in Wheel’s final round. I’d go for H, G, and B (G and B having been arbitrarily picked over F), and then O as my vowel. And maybe I’d win big. Of course, the biggest factor in all this is your ability to manipulate letters and words in your mind. That’s one subject in which I lack skill, as evinced by the Boggle-solving program I wrote (a story for another time). So I might tank, even if my statistically-chosen letters filled out quite a bit of the puzzle. A lot of it does come down to luck, which was probably the producers’ intention all along.

Google and China

Tuesday, April 25th, 2006

I’ve been thinking about Google and China, and I don’t understand why Google would concede to the Chinese Government’s demands that Google China censor its search results. The billion potential customers might be a potential draw, except for one fact — capitulating to censors would be against Google’s informal corporate motto: “Don’t be evil.”

Although evil is indeed a subjective quality, here it is not. The fact that a government (which laughingly calls itself the “People’s Republic of China”) would keep any information from its citizens automatically disqualifies it as being a republic of the people. Such a government is a republic of… the republic. Google is only skirting evil by acquiescing too China’s demands; Yahoo! is being downright evil by providing China with information it needs to prosecute dissenters.

A corporation should not be evil and in the United States, at least, Google isn’t. But what a difference a government makes. Google as a corporation (and all American companies for that matter) should stand for truth, equality, and making a profit — but the last goal should never come at the expense of the first two. If anything, Google should be doing everything it can to fight China’s censorship. It should answer the “People’s” government’s demand with a fuck you gleam in its eye, and a list of search results that inexplicably returns links to information about the Tiananmen Protests. And if the the PMRC doesn’t like it, then it can look elsewhere for a search engine for its sandboxed Internet.

Time for some Googlebombing

Tuesday, January 3rd, 2006

According to digg, Bjoern Harste, a German blogger, received a cease and desist order from his city regarding his blog. Apparently, they want him to stop using their name in his blog, because it’s coming up in the top 10 on Google and they don’t want surfers to think that his blog is associated with the city. Obviously, Sozialgericht Bremen doesn’t quite understand how Google works.

I can’t read the German, but I ran it through Google translation. Besides, Cease and Desist orders just piss me off. Everybody’s so damned touchy about their image and so forth. WTF ever.

If you want to help the city out, please don’t link to http://www.shopblogger.de/blog/archives/2715-Der-Brief-vom-Sozialgericht.html with “Sozialgericht Bremen” as your link text, as this would push Bjoern’s page even higher in the ranking. Wink, wink.