Organizing The Music

October 12th, 2009

So I’ve begun the process of manually merging my music collection. It’s a mess, quite frankly. I’ve got MP3s I’ve ripped or purchased on four different computers, spread throughout many directories. Compounding this is my iPod, which usually carries the latest tracks that I’ve added. Here’s how I’m organizing things. The fun part is that I got to write a Python script to help out.

First Steps

I’ve got one folder that was my primary music folder throughout my time in school. It rests on my file server. It generally contains all my music and is the most authoratative ‘source.’ In addition, it was the initial source, the ‘seed’ if you will, for the tracks on the iPod. At one point in the distant past, my iPod contained the tracks from this folder and nothing else. This is what I’m going to start with. To really drive home the point of my fresh start, I creates a share on my file server and started anew. These tracks wound up in a folder called ‘library.’

This is already a good start. I’ve been pretty meticulous in organizing my music library, essentially by artist then by album. The /library/ folder is going to be my new, massively-integrated library, as soon as I get finished organzing.

The iPod

Since my iPod contains several albums that never made it to the music share for one reason or another, it can also be considered ‘authoratative.’ So I ripped its contents to another folder in the new music shared, called /iPod/. I used the excellent tool SharePod to do this, as it allowed me to rip the tracks to artist/album folders with very little hastle.

Other Sources

I then rounded up all my other music, and put it into an ‘unsorted’ directory. This is stuff I would go through item by item, once the two primary sources were sorted out, and include or not include depending on if it wound up on my iPod or not. I have yet to get all the way through this step.

The Script

This is the important bit. I wrote a Python script to crawl through the two directories in parallel, and note any missing files or directories. This way, I’ll know what I need to copy from the /iPod/ folder to the /library/ folder. It’s a fairly simple command-line script, used like this:

compare.py left right outfile [filter1,filter2...]

left is the first directory, right is the second. outfile is a text file that the differences will be written to, and the [filter]s allow me to specify a whitelist of file types I care about. In this case, the whitelist would be restricted to audio file types. Here is the command I wound up running (drive Y:\ is the share I set up):

compare.ph Y:\library\ Y:\iPod\ Y:\results.txt mp3,m4a

This ran the Python script, comparing the /library/ and /iPod/ directories (and, recursively, their children), saving the log of all the differences to results.txt at the root of the share. Additionally, the program ignored any files except mp3 or m4a files (and directories, obviously). I wound up with a list of all the folders and files unique to the initial library and the one copied from my iPod. Then it was a simple matter to copy the iPod-unique folders to the library. I could even use it to update my iPod if I really wanted to, although it’s running pretty close to full now.

Of course, there’s still a lot of work to do: I’ve got to tag the /unsorted/ files. Have I mentioned how meticulous I am about my music library?

Source Code

import os # for files and paths
import sys # for command line arguments
 
def matches (path, fileName, filter):
    """Returns true if the given file matches the filter or is a directory, false otherwise.
    path - the directory the file resides in
    fileName - the name of the file in question
    filter - Either None to indicate no filtering should be applied, or a list of allowed extensions."""
    if filter == None:
        return True
    else:
        # if it's a directory, return true
        if (os.path.isdir(os.path.join(path, fileName))):
            return True
        ext = fileName.split(".").pop()
        return (ext in filter)
 
 
def compareDirectories (leftPath, rightPath, uniqueLeft, uniqueRight, filter = None):
    """Recursive function to compare the contents of two given directories. Two lists are
supplied to keep track of the unique files. An optional filter is allowed.
    leftPath - The path to the first directory.
    rightPath - The path to the second directory.
    uniqueLeft - A master list of files unique to the left directory tree.
    uniqueRight - A master list of files unique to the right directory tree.
    filter - Either None, or a list of allowed (whitelist) extensions for files. A unique file in
            either the left or right directory will not be counted as unique if its extension
            does not match one of the filter items."""
 
    # get contents of directories
    left = sorted(os.listdir(leftPath));
    right = sorted(os.listdir(rightPath));
 
    # without a filter, just add all unique files
    if (filter == None):
        # append unique files by using a list comprehension to get all files on one side
        # that are not on the other side
        uniqueLeft[len(uniqueLeft):] = [os.path.join(rightPath, fileName) for fileName in right if fileName not in left]
        uniqueRight[len(uniqueRight):] = [os.path.join(leftPath, fileName) for fileName in left if fileName not in right]
    # otherwise, use the filter function
    else:
        # same as above, but also checks to see that the files match the given filters
        uniqueLeft[len(uniqueLeft):] = [os.path.join(rightPath, fileName) for fileName in right
                                        if fileName not in left and matches(rightPath, fileName, filter)]
        uniqueRight[len(uniqueRight):] = [os.path.join(leftPath, fileName) for fileName in left
                                          if fileName not in right and matches(leftPath, fileName, filter)]
 
    # get a list of files in both directores. Since they by definition must be in both,
    # we can pull them from either side using a list comprehension to check that they're
    # in the other.
    both = [fileName for fileName in left if fileName in right]
 
    # now go through and recursively call the function for any directories in both parent directories
    for fileName in both:
        leftChild = os.path.join(leftPath, fileName)
        rightChild = os.path.join(rightPath, fileName)
        if (os.path.isdir(leftChild) and os.path.isdir(rightChild)):
            compareDirectories(leftChild, rightChild, uniqueLeft, uniqueRight, filter)
 
def usage ():
    print "\n\ncompare.py"
    print "Compares two directories recursively and lists files or folders unique to each one.\n"
    print "compare.py left right outfile [filter1,filter2...]"
    print "\tleft\tFirst directory to compare"
    print "\tright\tSecond directory to compare"
    print "\toutfile\tText file that results are written to"
    print "\t[filter1,filter2]\tOptional comma-separated whitelist"
    print" \t\t\t\tof extensions for files"
    exit()
 
if __name__ == "__main__":
    # slice off name of program from args
    args = sys.argv[1:]
 
    # if there's an incorrect number of parameters, print the usage
    if len(args) < 3 or len(args) > 4:
        usage()
 
    # set up filter whitelist, if any
    filter = None
    if len(args) == 4:
        filter = args[3].split(",")
 
    # set up lists of unique files on both sides
    uniqueRight = list();
    uniqueLeft = list();
 
    # do the comparison recursively
    compareDirectories(args[0], args[1], uniqueLeft, uniqueRight, filter)
 
    # write to the file
    out = open(args[2], 'w')
 
    out.write("UNIQUE TO LEFT:\n")
    for fileName in uniqueLeft:
        out.write(fileName + "\n")
 
    out.write("\nUNIQUE TO RIGHT:\n")
    for fileName in uniqueRight:
       out.write(fileName + "\n")
 
    out.close()

Wherin Crackers Strike

October 12th, 2009

Sometimes, even the best of us can get a good lesson in security.

Last week, I found out that my website was attacked by crackers. Notice I use the word ‘crackers,’ not ‘hackers’, because hackers are not crackers, and it’s important to maintain the distinction. From what they left I can tell that they clearly fall in the black-hat camp. If I’d gotten a warning e-mail or a message on my site to tighten my security, I would take it as a reminder to batten down the hatches. But since they just left a juicy payload, I can assume that they’re up to no good.

From looking at the files they left, I can tell they wanted continuing access to the shell account on my web host, and they wanted to do so in secret. Since I work in web programming, I’ve seen my share of more-conspicuous payloads. These are usually surreptitious JavaScript files, plopped at the end of legit PHP scripts to do nasty things. Most of what I’ve seen have been little snippets of code that act as drive-by downloaders, trying to pull malicious executables onto hapless users’ computers. The only thing that my attacker’s payload did was grant PHP execution and shell access. Not damaging to anyone who happens across a compromised site, but potentially damanging to me — it essentially gave them free reign of my SHH account.

I could speculate about what they wanted to do. There are a lot of ways to do nasty things on the web. PHP is web-aware enough to allow them to do as they please and ( assuming they’ve covered their track properly) not get caught, either. But I doubt they were able to get much nastiness accomplished, because they executed their attack sloppily: they dropped their payload in the wrong directory. They put their files one level above my web root, meaning that the scripts were inaccessible over the web. At first I thought they may have found an exploit in my framework to allow access to anything on the filesystem, but after reviewing their code, I can see that this is not the case. I guess they wanted to get in, drop off the files, and get out. They may have even made a second attempt after determining that the first one didn’t work; I found two files with exactly the same code.

How did they do it? I’m not certain, but I have an idea. The only web app I use is WordPress, and I’m updated to the current version, so this is an unlikely point-of-entry. They would have to know about an exploit that’s not been reported yet, which is possible, but doubtful.

Much more likely is that they managed to guess or sniff my password. I’m the guilty one here, as I was using a simple password that I’ve been using for years, which had little variation, was dictionary-based, and was much too short. In addition to that, I’ve got a webcam at home that posts images fairly frequently (at regular intervals), and it used the same account as my main FTP/Shell account. As you may know, FTP passwords are sent in cleartext, so this was definitely a potential point of entry. Assuming that the password was the point of failure, I’m lucky that they didn’t do more damage, as I used the same password for shell access, MySQL, and even my web control panel, so they theoretically could have locked me out of everything. I’m hypothesizing here, but I’d guess such an attack would be counterproductive; I think they just wanted another remote-control node on the web to carry out any dirty business they happened to think up.

Of course, I took steps to ensure that things are more locked down, starting with changing every password associated with this site. I did this as soon as I found out, and before anything else, to sever any venues they might have had to retaliate against me. Then I checked WordPress for updates, just in case there might have been an exploit I missed. Next, I updated how my webcam saves the periodic images and created a new account specifically for it. Finally, I did a quick review of my code base, making sure they hadn’t left another way to re-gain access. Basically, I pulled down my whole site and did a global search for any of the crackers’ friend functions: eval(), the base64 functions, system() and friends, and file-related functions. I’ve still got to re-upload all the code to feel 100% safe again, but I’m pretty certain that nothing slipped by.

Stay tuned, because after I’ve further reviewed what they left and when I’ve done a bit more research, I’ll post an analysis of the code itself.

Let It Snow! (early)

October 9th, 2009

Holy cow! I’m used to snow in Montana, especially earlier than you’d expect, but the earliest I remember seeing it was about a week before Halloween. Usually it’s merely a few light flakes here and there. Imagine my surprise this morning when I was eating breakfast and thought the pattern thrown on our lawn by the porch light was a bit too bright and white. I looked out the window and — sure enough! — there was a blanket of snow covering everything.

I grabbed some quick pictures after the sun came up because I wanted to try Carrie’s camera at full size (so very close to 8 megapixels). I think they came out okay. These were taken during our rush to work, so I didn’t try to do anything artsy. I just wanted to capture the moment. And what a moment! I’m going to have fun shoveling tonight…

Click the thumbnails below for a bigger view. I’ve also included links to the huge (about 3 MB) full-size images.

Dear NBC: Please Don’t Ruin Next Week’s “The Office”

October 5th, 2009

The Office premiered a few weeks ago, and it’s been a pretty good run so far this season. But there’s a ‘special event’ coming up this week *mdash; Jim and Pam’s wedding — and I’ve got a certain feeling of dread thinking about it. Let’s face it: TV networks love to let us down. So I’m asking you, NBC, from the bottom of my fanboy heart, not to ruin what should otherwise be an enjoyable and eventful episode of your fine show. I realize that the show is already in the bag, but I want to complain anyway, so I will. Got that?

Please, no drama. The Office is a comedy, after all. Drama can be good every once in a while, but you don’t need to inject it into every damn episode. This week’s show is a big one, and it would be nice if, just for once, everything could go off without a hitch. Can you imagine that? A fun episode through and through, with no cold feet or misunderstandings about such-and-such or reappearances of sketchy former boyfriends to install a feeling of doubt or any of those other tired, old wedding clichés… it would be refreshing.

The trend over the last decade or so has been to inject drama into sitcoms, and it’s worked pretty well in general. But… there’s always to danger of too much of a good thing. Just because it can make a certain series interesting and engaging (Scrubs comes immediately to mind, ditto Pushing Daisies) doesn’t mean that every episode ever needs it. Sometimes, I just want to laugh. There once was a time when adding a bit of emotion into an otherwise funny show was a rare thing and something to be admired. But then it became a fad, and everyone started doing it. I blame Friends, and Ross and Rachel. But as it has become the norm instead of the exception, it’s become a bit old. And now we’ve come half a circle, NBC, and you can do the new and different thing by not injecting some sort of crisis or epiphany or disaster into this week’s episode.

I’ve been pulling for Jim and Pam for a long time, NBC. After all, Jim is a guy I can relate to, and Pam is smokin’ hot. I just want them to be happy. The best moments on the show are the ones where we see them as a pair, happy and glad of each others’ company and relating like human beings. Yes, their drama worked early on and even drew me into the show, but now is the time for smiles and celebration. I want to see Michael be an idiot, and Dwight show some of that weird, off-putting ‘expert’ charm, and Andy fail with the ladies. I want to see all those things. But I also want to see Jim and Pam smiling and happy at the end of the episode, without some formulaic romantic comedy grade BS to foul up the hour. Is that too much to ask?

The biggest surprise of all, NBC, would be if you were to surprise me with no surprises. Just let things happen the way they should. I want a sense of finality when I turn the show off, not some lingering cloud of doom over the characters’ (and my own) heads.

Lutefisk Lament

September 12th, 2009

Today, I ended a decade-long search for a specific recording. If you’ve ever tried to search for something very specific and obscure on the web before, you might know my frustration. The recording in question was a Christmas-themed song, “Lutefisk Lament.” It’s basically a satire of “The Night Before Christmas,” about how horrible Lutefisk is.

The song has great sentimental value in my family, as we are mostly Norwegian. I’ve never had Lutefisk and probably never will, a testament to the impact the song has had on me. Our family came across it by way of a cassette tape, recorded from many novelty Christmas records my grandfather came by. He sold RCA stereos back in the time when my dad was a kid, and would occasionally be sent various promo albums. Someone along the lines compiled lots of the strange or novel Christmas songs from those records onto a tape.

This is the tape I remember, although I don’t remember it well. Every Christmas season my mom would pull it out of storage, and I would listen to it on a tiny portable cassette recorder while laying on the couch and waiting for Christmas Eve. I can only remember bits and pieces about the songs that were on it. I’ve searched rather fruitlessly on the web for years, and unfortunately have come across very little. I remember some songs (“We Need a Little Christmas” from the musical Mame), but since it was a homemade tape there’s no way to positvely identify any of the performers or even some of the song titles.

Every year, I google around for a bit, and usually come up short. Today, I started working on the annual Christmas album I record for my mom (if you haven’t gathered by now, she’s the parent who’s really into the holiday, and she definitely passed that fervor on to me). Going through lists of Christmas songs, I recalled my annual search and decided to tread through the same old Google search results, the same old blogs, and the same old dead links, desperate for the song. It’s not on Amazon and it’s not on iTunes, which is unfortunate because I would have bought it straight away if I’d found it.

This year was different. The first page of results had a link to a new page I hadn’t seen before. It mentioned a webpage which included downloads from WCCO-AM (in Minneapolis, MN). I went to the page and — lo and behold! — the track was there. I had the usual moment of fear as I worried that the link to the track would be a dead link, but it downloaded straight away, and I saved it, and now the track is mine, and it’s just as wonderful as I remember it being all those years ago, when Santa was an actual person who came down our chimney.

The song can be found on this page. Scroll down to 1980 under the first section, WCCO Air Checks. This version is done by Charlie Boone and Roger Erickson (this was one of the frustrating things I’d found out in years prior, frustrating because having the song name and performers wasn’t enough to find it). Please listen and enjoy, and spread the word if you can, because the world really needs to know about the terrors of Lutefisk.

Risk Automatic Dice Roller

August 31st, 2009

I went to Missoula last weekend to play some Risk, and saw that some RTAs had made an automatic Risk dice roller. Being the inquisitive type, I decided to write one myself. Mine uses pretty dice images I made, which somehow makes it better than the other one. It also has a few options for end-of-battle strategy.

Risk Auto Dice Roller

The source code is available, too. And because I’m awesome like that, here’s a .zip of the pretty pretty dice.

You can go to the page to see all the options.

Hurting America

August 21st, 2009

Remember when Jon Stewart went on CNN’s Crossfire (transcript for those who don’t want video) and told its hosts that they were “hurting America”? Truer words were never spoken, and as we get deeper and deeper into the debate on health care in America, I can see that things are getting worse. I’m thinking specifically of this incident at a ‘town hall’ meeting wherein a woman yells “Heil Hitler” at a man who is praising Israel’s health care system. Jokes about Godwin’s Law aside, is this really what the state of debate in the United States has come to?

Before I go on, I’d like to point out that I’m not exempt from this type of shameful discourse; in my younger years I was vehemently against George W. Bush, to the point of name-calling, and I’m sure some of that vehemence still exists in this blog’s archives. I’d like to think that as I became older and more mature, I left some of this partisan name-calling behind. At one time I was the very type of person I am criticizing now. Please center your Blame Ray squarely on the vagaries of youth.

The problems with our political dialogue are many, but one issue in particular is most responsible: an overwhelming majority of trouble arises because of our two-party system. Most people pick one side or the other, and the fact that there are ‘only’ two choices polarizes them. It frees the Democrats and Republicans to perpetuate an ‘us vs. them’ mentality, which is what George Washington warned about in his farewell address. Ah, Mister Washington, if only we’d listened! People treat their political affiliations like sporting teams competing against each other, and forget that actually, we’re kind, sorta, in this all together.

If it were just a matter of hating someone’s ideas there wouldn’t be a problem. But people today get into the troubling habit of hating someone because of the party they support. It’s not “I hate him because he thinks like a Democrat,” it’s “I hate him because he is a Democrat.” We refuse to even listen to someone whose opinion differs from our own. The comforting bosom of a political party exacerbates the problem, smothering us and drowning out anything but what we already want to hear. How are we supposed to have a thoughtful, intelligent discussion when our party provides us with all the talking points and all the names we need to throw to think we’ve won?

Things got pretty bad leading up to the election. The Right labeled Obama a “Socialist” and someone who “pals around with terrorists” (with an undercurrent of racism in the subtext that he was a terrorist), while the Left labeled McCain as an “old man” approaching senility and Palin as a “pitbull with lipstick.” The name-calling ramped up to a fever pitch in the media, and the real issues we were facing got lost because nobody felt like treating the American people like adults. It was much easier to deal with the voters as if they were children. Even Saturday Night Live got a boost from the election.

Another problem that’s dragging the state of American debate down is that if we listen to our opponents, it’s perceived as weakness. This also played out in the last election, as the Right looked at Obama’s willingness to talk with North Korea and Iran as tantamount to surrendering to them. What both sides fail to recognize is that they are never going to get 100% of what they want. This thing we call political debate needs to be called compromise. Those on the Left should realize that we will probably never have a nationalized health care system. Those on the Right should realize that the Left is currently in power and they’re forcing the issue, so it’s time to negotiate instead of sitting immobile with arms crossed.

Listening to what your opponents say is a strength. Since neither Democrats nor Republicans are going to get everything they want, they need to seek out the common ground. Any kind of health care reform that is forced down our throats (as the Democrats seem poised to do) will surely fail, because so many people will want it to fail. It’s amazingly easy to learn something from what your opponent says, as well, either to further your own argument or to help sway your opinion. Examining evidence and making a decision (or even changing one you’ve made before) is the mark of a reasonable person, not the brand of a cowardly flip-flopper. Clinging to decisions you’ve made because of a gut feeling and loudly singing songs while you cover your ears is actually cowardice; it’s the mark of someone who is afraid of their opponent’s words. If the point your adversary makes is making you uncomfortable, perhaps you should re-think your position.

Above this, however, is the most infuriating result of this political culture: the willful amnesia that both sides partake of any time there is a shift in power. When Obama was elected, the Right immediately shifted from defense to offense, just like when the ball changes hands in Football. Suddenly, instead of demanding that we should love America or leave it, they were crying foul about all sorts of changes the new President was planning. They forgot that mere months before they had been howling for the blood of dissenters. You can’t change your ideas of what is acceptable behavior just because of who is in charge. This show that your affiliation lies with your party, and that your convictions aren’t convictions; they’re just talking points designed to bolster your side and weaken your opponents’ stance. I didn’t care enough about politics when Clinton left office to notice, but I’m sure a similar about-face happened as soon as Bush was sworn in. This is worse than Hypocrisy. It’s flip-flopping with a vengeance, and being spiteful to boot.

There’s a certain vitriol to all this, a gleeful fanning of the flames which threaten to swallow us a whole. I shudder every time I read a comment on a discussion forum that suggests that the opposition should die or suffer horribly for what they say or believe. True, much discussion online is emboldened by anonymity, and even more can be shrugged away with claims that the poster was just joking. But underlying every jest is a grain of truth, and some of the things we read online are simply terrifying. The situation is made even worse by talk radio and pundits on TV. I hate to keep using the Right as an example of this because my whole point is about looking beyond the left-vs.-right line, but I lean to the left so this is what I tend to read.

The entire culture of politics in the United States is toxic because so many people resort to name-calling as a substitute for a measured, well-reasoned back-and-forth. They draw a line in the sand, pick a side, and set out determined not to listen, not to compromise, and not to treat the issues we face like the serious business they are. There’s more ratings in the childish chanting and parroting of talking points. We lose the signal of valuable discussion in the noise of the Keith Olbermanns, the Anne Coulters, the Rush Limbaughs, and the Michael Moores out there. It’s all so much hand-waving, and we’re playing right into their hands everytime we hurl an epithet instead of offering an argument.

How to Win (Or Maybe Not) on Wheel of Fortune

August 19th, 2009

Wheel! Of! Fortune!One nice thing about being a programmer is that you can automate certain calculations that you’d have to be crazy to attempt any other way. While some would see a non-programmer attempting to figure out some of this stuff as borderline insane, we coders just come across as eccentric with a lot of time on our hands. If people ask a question fairly frequently, and said question involves lots of number-crunching, you can bet some coder somewhere has taken a crack at trying to crunch those numbers.

Case in point: Wheel of Fortune. Now, I’ve never really been a huge fan of the show, but I see it a lot anyway. It’s on after Jeopardy!, which I really do enjoy and try to watch fairly frequently, so I’ve seen my fair share of episodes of Wheel. One thing that always got me was the final puzzle. For those of you who don’t know, this is how it works: the winning contestant from all the previous rounds must solve a shorter, harder puzzle by himself in a small span of time. He is given a (usually unhelpful) hint in the form a category, and some of the most common letters in the English language (R, S, T, L, N, and E) are already shown. Then the contestant must choose three more consonants and one more vowel. If any of these letters occur in the puzzle Vanna White shows them, and the players has ten seconds to guess what the word or phrase is.

There are other factors at play here, but they don’t relate to what I’m interested in most, namely: What are the best letters to pick? Can we do an analysis of the letter frequency of a whole bunch of these puzzles? Can we determine whether or not the producers of the show pay attention to these frequencies? Thanks to the Internet and some spare time in the hands of a programmer, the answer is a (qualified) yes, we can. Please note that while I do enjoy math, I am most certainly not a mathematician, so this is just an armchair analysis, and not a scholar’s take.

First, I needed a set of data. As interested as I was in determining the letter frequencies, I wasn’t about to spend six months collecting data by actually watching the end of each show. I have the Internet to do that sort of stuff for me! In this case, I found this forum, whose residents had already done the hard work. I was able to grab the final puzzles from a couple of threads on this site, and store them in some text files, one puzzle to a line. Then, I wrote a short Python script to parse through the results and generate links to a Google Charts representation of the data. If you’re going to screw around on the Internet, why waste time inputting data into Excel?

The Code

Below is the code. Please note that while I have commented it, it’s task-oriented code. I did not sit down and think things through for hours on end; I was more interested in the results produced by the code than the process of making it. To that end it may be a bit rough around the edges. If you’re a Python programmer you might even think it un-Pythonic.

Show/Hide Source Code

from operator import itemgetter # for sorting
import sys # for command-line arguments
 
# makes sorting dictionaries prettier
def sortDictionary (s):
    return sorted(s.items(), key = itemgetter(1), reverse = True)
 
hexColors = ["F05DCF", "F4B213", "7BB5FE", "19B915",
       "C913E4", "E38080", "4891EB", "DCF725", "E02EB0",
       "EE7D18", "16D949", "73E0C9", "22F1DB", "1460A1",
       "CF8040", "FFFFFF", "CF8054", "204E00", "2B1160",
       "87513C", "DECEE9", "C913E4", "83B892", "597D4C",
       "DACA5D", "2F486B", "D79E17", "826889", "359DA1",
       "DE7A43", "568C51", "FBF786"]
 
 
if __name__ == "__main__":
    # set up command line arguments
    # thumb:   creates a smaller file, with shorter (or no, depending on letter count) labels
    # verbose: prints out each list of letters and frequencies, too
 
    if (len(sys.argv) < 2 or len(sys.argv) > 4):
        print "Usage: wof.py [filename] [t|f] [v]"
        exit()
 
    thumb = False
    verbose = False
    fileName = sys.argv[1]
    if len(sys.argv) > 2 and sys.argv[2].lower() == "t":
        thumb = True
    if len(sys.argv) > 3 and sys.argv[3].lower() == "v":
        verbose = True
 
 
    # set up lists of letters
    letters = ["a", "b", "c", "d", "e", "f", "g", "h",
               "i", "j", "k", "l", "m", "n", "o", "p",
               "q", "r", "s", "t", "u", "v", "w", "x",
               "y", "z"]
    consonants = ["b", "c", "d", "f", "g", "h",
                "j", "k", "l", "m", "n", "p",
               "q", "r", "s", "t", "v", "w", "x",
               "y", "z"]
    vowels = ["a", "e", "i", "o", "u"]
 
    # letters to exclude (already given to you on game show)
    already = ["r", "s", "t", "l", "n", "e"]
 
 
    # set up frequency dictionaries {leter : number of occurences}
    allFrequencies = dict((letter, 0) for letter in letters)
    vowelFrequencies = dict((letter, 0) for letter in vowels)
    consonantFrequencies = dict((letter, 0) for letter in consonants);
 
    # Read the data file. Should consist of one final puzzle
    # solution per line, optionally lines can start with "#" for a comment
    file = open(fileName)
    while True:
        line = file.readline()
        if not line: break #end of loop
        if line[0] == "#": continue # skip comments
        for letter in line:
            lower = letter.lower()
            if lower in allFrequencies:
                allFrequencies[lower] = allFrequencies[lower] + 1
            if lower in already: # exclude RSTLNE from vowels and consonants
                break
            if lower in vowelFrequencies:
                vowelFrequencies[lower] = vowelFrequencies[lower] + 1
            if lower in consonantFrequencies:
                consonantFrequencies[lower] = consonantFrequencies[lower] + 1
 
    #sort dictionaries
    allFrequencies = sortDictionary(allFrequencies);
    vowelFrequencies = sortDictionary(vowelFrequencies);
    consonantFrequencies = sortDictionary(consonantFrequencies);
 
    if verbose:
        #display the lists
        print "ALL:\n", allFrequencies
        print "\nVOWELS:\n", vowelFrequencies
        print "\nCONSONANTS:\n", consonantFrequencies
 
 
    charts = {"All+Letters" : allFrequencies, "Vowels" : vowelFrequencies,
             "Consonants" : consonantFrequencies}
 
    for chart in charts:
        # make the image URLs, using Google Charts
        if thumb:
            url = "http://chart.apis.google.com/chart?chs=100x100&cht=p"
        else:
            url = "http://chart.apis.google.com/chart?chs=400x300&cht=p"
 
        # build lists for data series and its labels
        labels = []
        data = []
        for entry in charts[chart]:
            if int(entry[1]) > 0: # exclude any letters not used
                # make sure a thumbnail doesn't have too many labels to clutter it
                if thumb and len(charts[chart]) <= 6:
                    labels.append(entry[0].upper())
                else:
                    labels.append(entry[0].upper() + "+(" + str(entry[1]) + ")")
                data.append(str(entry[1]))
 
        # set them to the query string parts for data and labels
        dataRange = "&chd=t:" + ",".join(data);
        if (thumb and len(charts[chart]) >= 6):
            labelRange = ""
        else:
            labelRange = "&chl=" + "|".join(labels);
 
 
        # build the array of chart colors
        chartColors = "&chco=" + ",".join(hexColors[0:len(charts[chart])-2])
 
        # build final URL
        url = url + dataRange + labelRange + "&" + chartColors + "&chtt=" + chart;
        print "\n", chart, "\n", url

The Results

Might as well show off the pretty, pretty pictures, huh? Click any graph below to enlarge it.

At first look, the data is not too promising. I can give you two letters that will increase your chances of getting a ‘hit’, and one of them might come in handy. In our six months’ worth of data, O is the favorite… but not by much. Looking at both periods, it seems pretty clear that somebody at Merv Griffin Productions is responsible for distributing the vowels O, I, and A across the spectrum so none of them shows up too frequently. Notice how I and O are tied on the most recent set of data, but A is the second-most frequent vowel on the older figures. Combining all the numbers, we see that these three vowels are essentially tied in frequency, with U in a slightly lower class. But at least it’s something to work with, right? From the last six months of Wheel, it looks like ‘O’ is the best vowel to go with.

Now what about consonants? I was most excited when I pulled up the 2009 consonants graph (the first one I did), because you can clearly see that the top two letters are definitely a bit more common than the rest, and even the top three look pretty solid. H, G, and D… could those be the winning ones? My excitement faded, however, as I ran the earlier set of data through the script. Looking at the combined chart, H still has a statistically significant lead. But you’ll have no luck trying to discover the three letters to choose. But we can limit our options a little. F, G, and B are all clearly separated from the next letter (D) in the combined graph, with a decent-sized gap between them. It’s harder to say for certain, but it looks like the producers may be balancing these top four letters throughout their puzzles.

So, what to go with? You should definitely choose O for your vowel. H is the consonants which statistically is most likely to occur. Then any of F, G, and B would probably do you some good.

And how about those shifty producers? Are they gaming the final answers, so they don’t have to give out as much prize money? Are they maybe picking and choosing their phrases to deflect somebody who did a little research before heading down to the studio to play? Well, let’s try to find a pattern in the frequency of letters in the English language (please note that I’ve removed RSTLNE from these graphs):

Right away you should notice some major discrepancies between the Wheel of Fortune data set and written English. While H shows up at the top where we’d expect, D is clearly in a much higher class than B, F, and G that we picked above. In fact, F and G aren’t even in the top five, and other letters that show up often in English aren’t placed very high in the combined final puzzle data. This is probably the result of producers fine-tuning their answers over the years, either to avoid the letters contestants chose most often or to more evenly distribute the winning ones.

This is such a small data set, however, that we shouldn’t rely on it too heavily. After all, Wheel of Fortune has been on the air for twenty-six years, and we only have half a year’s worth of data, or around 2% of all that is available. But a small attempt at analyzing this data is probably better than going in blind and picking letters that you ‘think’ show up frequently.

There are other ways that this quick-and-dirty analysis can be improved. Mine is a pretty naive approach. Going over some basic rules of English might help to improve the method. For example, breaking the final puzzles down into phonemes could yield more information, as might looking at letter pairs instead of single letters. For instance, Q never occurs without U, and some letters are more common after others. This is especially useful in our task, as we need to choose three consonants but only one vowel. Consonants are most often followed by vowels, so consonant pairs increase the uniqueness of a phrase. P is often followed by R, L, or H, for example. Looking for patterns in the words themselves might also yield better predictions about what letters would be better to guess.

Another thing to realize is that these numbers are averages from a discrete set. Some puzzles might include the high-frequency letters and be solvable with only those (and RSTLNE), while some might not include a single one of the high-frequency picks. Picking from one of the high scorers might improve your odds of getting more letters, but it doesn’t guarantee that you’ll get some, or even any. You might wind up with something like ‘Blind Luck’, which doesn’t contain an O or an H. These estimates can help you, but only so much.

So, after all these calculations, I now know what I would do if I ever found myself in Wheel’s final round. I’d go for H, G, and B (G and B having been arbitrarily picked over F), and then O as my vowel. And maybe I’d win big. Of course, the biggest factor in all this is your ability to manipulate letters and words in your mind. That’s one subject in which I lack skill, as evinced by the Boggle-solving program I wrote (a story for another time). So I might tank, even if my statistically-chosen letters filled out quite a bit of the puzzle. A lot of it does come down to luck, which was probably the producers’ intention all along.

How I Survived My Net Outage

August 17th, 2009

There are about fifteen points of failure between my upstairs computer and the Internet at large. So when my web access stopped working last Thursday night, I had a long way to go before I blamed my Internet Service Provider (Bresnan). Having worked at a sort of ISP for four years (the DirectConnect Office at the University of Montana), I was quite used to customers immediately blaming their provider, instead of the technology that was immediately at their control. After all, they hadn’t changed anything, right, so it must be the connection itself. Or, as they each usually liked to call it, “my Internet.” Oh, that’s nice. Got your own personal Internet, huh?

The point is, I didn’t call my ISP the second something went wrong. Because the probability that it was on my end was greater than 86.525%, I started troubleshooting my way upstream. Now, between my upstairs computer, there are a number of failure points. To wit:

  1. The computer itself connects to a D-Link wireless bridge (the model number escapes me now). This bridge works flawlessly until something goes wrong, at which point it acts like a brick with a five-port switch.
  2. Downstairs is the bridge’s counterpart, a D-Link Draft N access point. Because of the layout of my house and probably due to its age, the highest signal strength I can on the wireless bridge upstairs is about 66%. The signal has to travel through two walls and a ceiling to get upstairs, and I have the strength set lower than maximum to keep the signal within a reasonable range of my house.
  3. From there, the signal has to go through a switch to the actual router, which is a LinkSys model running the Tomato firmware.
  4. Finally, from the router, we get to the cable modem, which plugs into the ‘real’ Internet and signals the end of my responsibility.

Between each of these nodes, of course, are the usual points of failure. This includes the hodge-podge of Ethernet cables I’ve collected throughout my life, any of which might have lost the tab that’s supposed to hold it in place. I had a bit of troubleshooting to do, but was still pretty certain that the problem rested with me.

The first thing I did was connect to the router’s web configuration. This worked right away, which eliminated most of my signal chain. Unless my connection problems were the result of some bad Voodoo (which has happened before; maybe sometime I’ll tell you the horror story of my Ethernet-Over-Power attempts to link my upstairs and downstairs network legs), it was looking more and more like Bresnan’s fault. This was confirmed when I checked the lights on my cable modem. Being a rational-thinking person, I resisted the temptation to reset my router and my cable modem. I have known people who instantly do this even if their connection is a bit slow, and I assume that the one or two times this actually worked was enough to reinforce the superstition in them. The final nail in coffin for Bresnan was that I could release and renew my router’s IP address, but could not use my router to ping anything upstream, except my router’s router.

Since it was a bit late in the day, I resisted the urge to call my ISP and complain. Often people would aggravate me by calling to complain about outages at school. These outages could be beyond our control, as we were not the people in charge of the ‘pipe’ to the Internet, but were merely intermediaries between the campus’s IT department and the dorm residents. I figured that if it were a serious problem, my connection would still not be working in the morning.

As the sun was climbing in the sky I checked, and everything was working. I really didn’t give it much more thought; after all six hours of downtime in the fourteen months I’d been a customer translated to at least three nines (99.9%) of uptime, which was pretty good for a residential service.

I didn’t give it much thought, but my ISP did.

I’m used to being given a bad shake by corporations, but this time Bresnan came through. This weekend, I got a voicemail from an unknown number, which turned out to be a recording from them, apologizing for my downtime. This wasn’t a half-assed CYA thing, because it had obviously cost them some money: somebody had to write the words, and they had to call all the customers who were affected, which would have cost them a bit. I wasn’t expecting any sort of acknowledgement of the outage; after all, I hadn’t called to complain. The fact that they left me a message means that they left everyone who was affected a message, too. And that shows that they care. This is good for me, because a company that cares about its image is probably less likely to be a jerk to its customers.

Often in business, those who are loudest are the ones who receive the most attention. It’s good to see that every once in a corporations can rise about the soullessness with which we often endow them, put on a human face, and treat their customers with respect.

Twitter: UR DOING IT RONG

August 13th, 2009

I’ve been using Twitter for a while now, and in that short amount of time, I’ve heard a lot about how other people are using it, too. Twitter started out as something special, but if we’re not careful, it’s going to become another part of the dregs of the Internet — a haven for spammers and friendwhores.

The allure of the site for me was the microblogging aspect. Random things pop into my head throughout the day, and some of them are serviceable enough to share. This wasn’t a problem in my old job, because I worked in the same room with a bunch of like-minded peers who often agreed with me and had more input in the same oeuvre. Now, however, I have an office, and can’t shout random observations about whatever pops into my head to my coworkers. They’re for the most part older than me, and not as interested in video games, comic books, programming, and old 80’s pop culture as I am. By microblogging on Twitter, I was potentially able to share these thoughts with like-minded individuals.

But, as with most other things on the web, people are seeing this new platform for sharing as a chance for self-promotion. This comes in two flavors, both bitter: those who want to make money off of it, and those who want to increase their status on it. Both of these reasons are wrong and worsen the site.

Those who try to make money off of it usually do it wrong. My follower count hovers around 50, going up and down by two or three accounts every day, as spammers find me, follow me and 1,000 other people, then get reported and banned. The spammers are easy to spot: every single tweet is a link, and they usually mention a) making money or b) sex, apparently the only two revenue-generating topics on the whole wide web. There are a few commercial accounts who get it right. One I’m particularly fond of is Amazon’s MP3 store (@amazonmp3), which posts a daily discounted album every single day. The key difference between this use of Twitter and the spammers is that they are offering something of value to me: cheap downloads of music. The spammers, on the other hand, are only trying to make money for themselves. They want to take, take, take without giving back, and they’re tearing Twitter apart.

The other kind are the friend whores, the people who do anything to get others to follow them. I’m sure one or two of the daily fluctuations in my Twitter followers come from people who follow me only because they expect to be followed back. This isn’t the way the site is supposed to work, fellas. You follow me because you think I have something interesting to say. I follow you back if I believe the same thing. This isn’t some sort of commodities market, where we trade shares in each others’ tweets. The point of twitter isn’t to gather as many followers as possible. I have to admit that in my early days, I was guilty of exacerbating things. I would reciprocate follows. This led to trouble when I logged on and realized that I really didn’t care about so-and-so’s self-promotion or auto-generated messages about tools they were using. I would ignore their posts, and at the same time miss the point — your Twitter feed is for hearing things from people you find interesting. Your signal-to-noise ratio should be infinite, because you should follow only those people who interest you to begin with, and you shouldn’t find any noise cluttering your feed.

I’m sure Twitter is doing something to combat the spammers, but I’m not so sure about the others. Twitter has lots of neat applications, but the company can’t really help it if their site is overrun by the self-promoters. The best technology in the wrong hands (and I don’t mean evil or even malicious hands) can become worthless. I’ll continue to use Twitter the way I think it should be used, but it’s becoming harder and harder to find like-minded individuals. The thing with the web is that if one site doesn’t do what you want, there’s usually another out there gunning for it. The question is this: will Twitter realize this before it’s washed out by its own users?