The other day at work I was waiting for build after build and deployment after deployment and got a little bit bored, so I used some spare LEGOs to build this, a mosaic of the sprite from the original 8-bit Legend Of Zelda. Then I got back to work and MEGA-CODED. I swear I actually work at work.
Posts Tagged ‘Programming’
LEGO Link Sprite
Friday, August 19th, 2011Organizing The Music
Monday, October 12th, 2009So I’ve begun the process of manually merging my music collection. It’s a mess, quite frankly. I’ve got MP3s I’ve ripped or purchased on four different computers, spread throughout many directories. Compounding this is my iPod, which usually carries the latest tracks that I’ve added. Here’s how I’m organizing things. The fun part is that I got to write a Python script to help out.
First Steps
I’ve got one folder that was my primary music folder throughout my time in school. It rests on my file server. It generally contains all my music and is the most authoratative ‘source.’ In addition, it was the initial source, the ‘seed’ if you will, for the tracks on the iPod. At one point in the distant past, my iPod contained the tracks from this folder and nothing else. This is what I’m going to start with. To really drive home the point of my fresh start, I creates a share on my file server and started anew. These tracks wound up in a folder called ‘library.’
This is already a good start. I’ve been pretty meticulous in organizing my music library, essentially by artist then by album. The /library/ folder is going to be my new, massively-integrated library, as soon as I get finished organzing.
The iPod
Since my iPod contains several albums that never made it to the music share for one reason or another, it can also be considered ‘authoratative.’ So I ripped its contents to another folder in the new music shared, called /iPod/. I used the excellent tool SharePod to do this, as it allowed me to rip the tracks to artist/album folders with very little hastle.
Other Sources
I then rounded up all my other music, and put it into an ‘unsorted’ directory. This is stuff I would go through item by item, once the two primary sources were sorted out, and include or not include depending on if it wound up on my iPod or not. I have yet to get all the way through this step.
The Script
This is the important bit. I wrote a Python script to crawl through the two directories in parallel, and note any missing files or directories. This way, I’ll know what I need to copy from the /iPod/ folder to the /library/ folder. It’s a fairly simple command-line script, used like this:
compare.py left right outfile [filter1,filter2...]
left is the first directory, right is the second. outfile is a text file that the differences will be written to, and the [filter]s allow me to specify a whitelist of file types I care about. In this case, the whitelist would be restricted to audio file types. Here is the command I wound up running (drive Y:\ is the share I set up):
compare.ph Y:\library\ Y:\iPod\ Y:\results.txt mp3,m4a
This ran the Python script, comparing the /library/ and /iPod/ directories (and, recursively, their children), saving the log of all the differences to results.txt at the root of the share. Additionally, the program ignored any files except mp3 or m4a files (and directories, obviously). I wound up with a list of all the folders and files unique to the initial library and the one copied from my iPod. Then it was a simple matter to copy the iPod-unique folders to the library. I could even use it to update my iPod if I really wanted to, although it’s running pretty close to full now.
Of course, there’s still a lot of work to do: I’ve got to tag the /unsorted/ files. Have I mentioned how meticulous I am about my music library?
Source Code
import os # for files and paths import sys # for command line arguments def matches (path, fileName, filter): """Returns true if the given file matches the filter or is a directory, false otherwise. path - the directory the file resides in fileName - the name of the file in question filter - Either None to indicate no filtering should be applied, or a list of allowed extensions.""" if filter == None: return True else: # if it's a directory, return true if (os.path.isdir(os.path.join(path, fileName))): return True ext = fileName.split(".").pop() return (ext in filter) def compareDirectories (leftPath, rightPath, uniqueLeft, uniqueRight, filter = None): """Recursive function to compare the contents of two given directories. Two lists are supplied to keep track of the unique files. An optional filter is allowed. leftPath - The path to the first directory. rightPath - The path to the second directory. uniqueLeft - A master list of files unique to the left directory tree. uniqueRight - A master list of files unique to the right directory tree. filter - Either None, or a list of allowed (whitelist) extensions for files. A unique file in either the left or right directory will not be counted as unique if its extension does not match one of the filter items.""" # get contents of directories left = sorted(os.listdir(leftPath)); right = sorted(os.listdir(rightPath)); # without a filter, just add all unique files if (filter == None): # append unique files by using a list comprehension to get all files on one side # that are not on the other side uniqueLeft[len(uniqueLeft):] = [os.path.join(rightPath, fileName) for fileName in right if fileName not in left] uniqueRight[len(uniqueRight):] = [os.path.join(leftPath, fileName) for fileName in left if fileName not in right] # otherwise, use the filter function else: # same as above, but also checks to see that the files match the given filters uniqueLeft[len(uniqueLeft):] = [os.path.join(rightPath, fileName) for fileName in right if fileName not in left and matches(rightPath, fileName, filter)] uniqueRight[len(uniqueRight):] = [os.path.join(leftPath, fileName) for fileName in left if fileName not in right and matches(leftPath, fileName, filter)] # get a list of files in both directores. Since they by definition must be in both, # we can pull them from either side using a list comprehension to check that they're # in the other. both = [fileName for fileName in left if fileName in right] # now go through and recursively call the function for any directories in both parent directories for fileName in both: leftChild = os.path.join(leftPath, fileName) rightChild = os.path.join(rightPath, fileName) if (os.path.isdir(leftChild) and os.path.isdir(rightChild)): compareDirectories(leftChild, rightChild, uniqueLeft, uniqueRight, filter) def usage (): print "\n\ncompare.py" print "Compares two directories recursively and lists files or folders unique to each one.\n" print "compare.py left right outfile [filter1,filter2...]" print "\tleft\tFirst directory to compare" print "\tright\tSecond directory to compare" print "\toutfile\tText file that results are written to" print "\t[filter1,filter2]\tOptional comma-separated whitelist" print" \t\t\t\tof extensions for files" exit() if __name__ == "__main__": # slice off name of program from args args = sys.argv[1:] # if there's an incorrect number of parameters, print the usage if len(args) < 3 or len(args) > 4: usage() # set up filter whitelist, if any filter = None if len(args) == 4: filter = args[3].split(",") # set up lists of unique files on both sides uniqueRight = list(); uniqueLeft = list(); # do the comparison recursively compareDirectories(args[0], args[1], uniqueLeft, uniqueRight, filter) # write to the file out = open(args[2], 'w') out.write("UNIQUE TO LEFT:\n") for fileName in uniqueLeft: out.write(fileName + "\n") out.write("\nUNIQUE TO RIGHT:\n") for fileName in uniqueRight: out.write(fileName + "\n") out.close() |
Wheel of Fortune Letter Frequency Analyzer
Wednesday, August 19th, 20091 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 | from operator import itemgetter # for sorting import sys # for command-line arguments # makes sorting dictionaries prettier def sortDictionary (s): return sorted(s.items(), key = itemgetter(1), reverse = True) hexColors = ["F05DCF", "F4B213", "7BB5FE", "19B915", "C913E4", "E38080", "4891EB", "DCF725", "E02EB0", "EE7D18", "16D949", "73E0C9", "22F1DB", "1460A1", "CF8040", "FFFFFF", "CF8054", "204E00", "2B1160", "87513C", "DECEE9", "C913E4", "83B892", "597D4C", "DACA5D", "2F486B", "D79E17", "826889", "359DA1", "DE7A43", "568C51", "FBF786"] if __name__ == "__main__": # set up command line arguments # thumb: creates a smaller file, with shorter (or no, depending on letter count) labels # verbose: prints out each list of letters and frequencies, too if (len(sys.argv) < 2 or len(sys.argv) > 4): print "Usage: wof.py [filename] [t|f] [v]" exit() thumb = False verbose = False fileName = sys.argv[1] if len(sys.argv) > 2 and sys.argv[2].lower() == "t": thumb = True if len(sys.argv) > 3 and sys.argv[3].lower() == "v": verbose = True # set up lists of letters letters = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"] consonants = ["b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z"] vowels = ["a", "e", "i", "o", "u"] # letters to exclude (already given to you on game show) already = ["r", "s", "t", "l", "n", "e"] # set up frequency dictionaries {leter : number of occurences} allFrequencies = dict((letter, 0) for letter in letters) vowelFrequencies = dict((letter, 0) for letter in vowels) consonantFrequencies = dict((letter, 0) for letter in consonants); # Read the data file. Should consist of one final puzzle # solution per line, optionally lines can start with "#" for a comment file = open(fileName) while True: line = file.readline() if not line: break #end of loop if line[0] == "#": continue # skip comments for letter in line: lower = letter.lower() if lower in allFrequencies: allFrequencies[lower] = allFrequencies[lower] + 1 if lower in already: # exclude RSTLNE from vowels and consonants break if lower in vowelFrequencies: vowelFrequencies[lower] = vowelFrequencies[lower] + 1 if lower in consonantFrequencies: consonantFrequencies[lower] = consonantFrequencies[lower] + 1 #sort dictionaries allFrequencies = sortDictionary(allFrequencies); vowelFrequencies = sortDictionary(vowelFrequencies); consonantFrequencies = sortDictionary(consonantFrequencies); if verbose: #display the lists print "ALL:\n", allFrequencies print "\nVOWELS:\n", vowelFrequencies print "\nCONSONANTS:\n", consonantFrequencies charts = {"All+Letters" : allFrequencies, "Vowels" : vowelFrequencies, "Consonants" : consonantFrequencies} for chart in charts: # make the image URLs, using Google Charts if thumb: url = "http://chart.apis.google.com/chart?chs=100x100&cht=p" else: url = "http://chart.apis.google.com/chart?chs=400x300&cht=p" # build lists for data series and its labels labels = [] data = [] for entry in charts[chart]: if int(entry[1]) > 0: # exclude any letters not used # make sure a thumbnail doesn't have too many labels to clutter it if thumb and len(charts[chart]) <= 6: labels.append(entry[0].upper()) else: labels.append(entry[0].upper() + "+(" + str(entry[1]) + ")") data.append(str(entry[1])) # set them to the query string parts for data and labels dataRange = "&chd=t:" + ",".join(data); if (thumb and len(charts[chart]) >= 6): labelRange = "" else: labelRange = "&chl=" + "|".join(labels); # build the array of chart colors chartColors = "&chco=" + ",".join(hexColors[0:len(charts[chart])-2]) # build final URL url = url + dataRange + labelRange + "&" + chartColors + "&chtt=" + chart; print "\n", chart, "\n", url |
How to Win (Or Maybe Not) on Wheel of Fortune
Wednesday, August 19th, 2009One nice thing about being a programmer is that you can automate certain calculations that you’d have to be crazy to attempt any other way. While some would see a non-programmer attempting to figure out some of this stuff as borderline insane, we coders just come across as eccentric with a lot of time on our hands. If people ask a question fairly frequently, and said question involves lots of number-crunching, you can bet some coder somewhere has taken a crack at trying to crunch those numbers.
Case in point: Wheel of Fortune. Now, I’ve never really been a huge fan of the show, but I see it a lot anyway. It’s on after Jeopardy!, which I really do enjoy and try to watch fairly frequently, so I’ve seen my fair share of episodes of Wheel. One thing that always got me was the final puzzle. For those of you who don’t know, this is how it works: the winning contestant from all the previous rounds must solve a shorter, harder puzzle by himself in a small span of time. He is given a (usually unhelpful) hint in the form a category, and some of the most common letters in the English language (R, S, T, L, N, and E) are already shown. Then the contestant must choose three more consonants and one more vowel. If any of these letters occur in the puzzle Vanna White shows them, and the players has ten seconds to guess what the word or phrase is.
There are other factors at play here, but they don’t relate to what I’m interested in most, namely: What are the best letters to pick? Can we do an analysis of the letter frequency of a whole bunch of these puzzles? Can we determine whether or not the producers of the show pay attention to these frequencies? Thanks to the Internet and some spare time in the hands of a programmer, the answer is a (qualified) yes, we can. Please note that while I do enjoy math, I am most certainly not a mathematician, so this is just an armchair analysis, and not a scholar’s take.
First, I needed a set of data. As interested as I was in determining the letter frequencies, I wasn’t about to spend six months collecting data by actually watching the end of each show. I have the Internet to do that sort of stuff for me! In this case, I found this forum, whose residents had already done the hard work. I was able to grab the final puzzles from a couple of threads on this site, and store them in some text files, one puzzle to a line. Then, I wrote a short Python script to parse through the results and generate links to a Google Charts representation of the data. If you’re going to screw around on the Internet, why waste time inputting data into Excel?
The Code
Below is the code. Please note that while I have commented it, it’s task-oriented code. I did not sit down and think things through for hours on end; I was more interested in the results produced by the code than the process of making it. To that end it may be a bit rough around the edges. If you’re a Python programmer you might even think it un-Pythonic.
from operator import itemgetter # for sorting import sys # for command-line arguments # makes sorting dictionaries prettier def sortDictionary (s): return sorted(s.items(), key = itemgetter(1), reverse = True) hexColors = ["F05DCF", "F4B213", "7BB5FE", "19B915", "C913E4", "E38080", "4891EB", "DCF725", "E02EB0", "EE7D18", "16D949", "73E0C9", "22F1DB", "1460A1", "CF8040", "FFFFFF", "CF8054", "204E00", "2B1160", "87513C", "DECEE9", "C913E4", "83B892", "597D4C", "DACA5D", "2F486B", "D79E17", "826889", "359DA1", "DE7A43", "568C51", "FBF786"] if __name__ == "__main__": # set up command line arguments # thumb: creates a smaller file, with shorter (or no, depending on letter count) labels # verbose: prints out each list of letters and frequencies, too if (len(sys.argv) < 2 or len(sys.argv) > 4): print "Usage: wof.py [filename] [t|f] [v]" exit() thumb = False verbose = False fileName = sys.argv[1] if len(sys.argv) > 2 and sys.argv[2].lower() == "t": thumb = True if len(sys.argv) > 3 and sys.argv[3].lower() == "v": verbose = True # set up lists of letters letters = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"] consonants = ["b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z"] vowels = ["a", "e", "i", "o", "u"] # letters to exclude (already given to you on game show) already = ["r", "s", "t", "l", "n", "e"] # set up frequency dictionaries {leter : number of occurences} allFrequencies = dict((letter, 0) for letter in letters) vowelFrequencies = dict((letter, 0) for letter in vowels) consonantFrequencies = dict((letter, 0) for letter in consonants); # Read the data file. Should consist of one final puzzle # solution per line, optionally lines can start with "#" for a comment file = open(fileName) while True: line = file.readline() if not line: break #end of loop if line[0] == "#": continue # skip comments for letter in line: lower = letter.lower() if lower in allFrequencies: allFrequencies[lower] = allFrequencies[lower] + 1 if lower in already: # exclude RSTLNE from vowels and consonants break if lower in vowelFrequencies: vowelFrequencies[lower] = vowelFrequencies[lower] + 1 if lower in consonantFrequencies: consonantFrequencies[lower] = consonantFrequencies[lower] + 1 #sort dictionaries allFrequencies = sortDictionary(allFrequencies); vowelFrequencies = sortDictionary(vowelFrequencies); consonantFrequencies = sortDictionary(consonantFrequencies); if verbose: #display the lists print "ALL:\n", allFrequencies print "\nVOWELS:\n", vowelFrequencies print "\nCONSONANTS:\n", consonantFrequencies charts = {"All+Letters" : allFrequencies, "Vowels" : vowelFrequencies, "Consonants" : consonantFrequencies} for chart in charts: # make the image URLs, using Google Charts if thumb: url = "http://chart.apis.google.com/chart?chs=100x100&cht=p" else: url = "http://chart.apis.google.com/chart?chs=400x300&cht=p" # build lists for data series and its labels labels = [] data = [] for entry in charts[chart]: if int(entry[1]) > 0: # exclude any letters not used # make sure a thumbnail doesn't have too many labels to clutter it if thumb and len(charts[chart]) <= 6: labels.append(entry[0].upper()) else: labels.append(entry[0].upper() + "+(" + str(entry[1]) + ")") data.append(str(entry[1])) # set them to the query string parts for data and labels dataRange = "&chd=t:" + ",".join(data); if (thumb and len(charts[chart]) >= 6): labelRange = "" else: labelRange = "&chl=" + "|".join(labels); # build the array of chart colors chartColors = "&chco=" + ",".join(hexColors[0:len(charts[chart])-2]) # build final URL url = url + dataRange + labelRange + "&" + chartColors + "&chtt=" + chart; print "\n", chart, "\n", url |
The Results
Might as well show off the pretty, pretty pictures, huh? Click any graph below to enlarge it.
Feb. – June 2009 | Sept – Jan 2008 | Combined |
---|---|---|
At first look, the data is not too promising. I can give you two letters that will increase your chances of getting a ‘hit’, and one of them might come in handy. In our six months’ worth of data, O is the favorite… but not by much. Looking at both periods, it seems pretty clear that somebody at Merv Griffin Productions is responsible for distributing the vowels O, I, and A across the spectrum so none of them shows up too frequently. Notice how I and O are tied on the most recent set of data, but A is the second-most frequent vowel on the older figures. Combining all the numbers, we see that these three vowels are essentially tied in frequency, with U in a slightly lower class. But at least it’s something to work with, right? From the last six months of Wheel, it looks like ‘O’ is the best vowel to go with.
Now what about consonants? I was most excited when I pulled up the 2009 consonants graph (the first one I did), because you can clearly see that the top two letters are definitely a bit more common than the rest, and even the top three look pretty solid. H, G, and D… could those be the winning ones? My excitement faded, however, as I ran the earlier set of data through the script. Looking at the combined chart, H still has a statistically significant lead. But you’ll have no luck trying to discover the three letters to choose. But we can limit our options a little. F, G, and B are all clearly separated from the next letter (D) in the combined graph, with a decent-sized gap between them. It’s harder to say for certain, but it looks like the producers may be balancing these top four letters throughout their puzzles.
So, what to go with? You should definitely choose O for your vowel. H is the consonants which statistically is most likely to occur. Then any of F, G, and B would probably do you some good.
And how about those shifty producers? Are they gaming the final answers, so they don’t have to give out as much prize money? Are they maybe picking and choosing their phrases to deflect somebody who did a little research before heading down to the studio to play? Well, let’s try to find a pattern in the frequency of letters in the English language (please note that I’ve removed RSTLNE from these graphs):
Right away you should notice some major discrepancies between the Wheel of Fortune data set and written English. While H shows up at the top where we’d expect, D is clearly in a much higher class than B, F, and G that we picked above. In fact, F and G aren’t even in the top five, and other letters that show up often in English aren’t placed very high in the combined final puzzle data. This is probably the result of producers fine-tuning their answers over the years, either to avoid the letters contestants chose most often or to more evenly distribute the winning ones.
This is such a small data set, however, that we shouldn’t rely on it too heavily. After all, Wheel of Fortune has been on the air for twenty-six years, and we only have half a year’s worth of data, or around 2% of all that is available. But a small attempt at analyzing this data is probably better than going in blind and picking letters that you ‘think’ show up frequently.
There are other ways that this quick-and-dirty analysis can be improved. Mine is a pretty naive approach. Going over some basic rules of English might help to improve the method. For example, breaking the final puzzles down into phonemes could yield more information, as might looking at letter pairs instead of single letters. For instance, Q never occurs without U, and some letters are more common after others. This is especially useful in our task, as we need to choose three consonants but only one vowel. Consonants are most often followed by vowels, so consonant pairs increase the uniqueness of a phrase. P is often followed by R, L, or H, for example. Looking for patterns in the words themselves might also yield better predictions about what letters would be better to guess.
Another thing to realize is that these numbers are averages from a discrete set. Some puzzles might include the high-frequency letters and be solvable with only those (and RSTLNE), while some might not include a single one of the high-frequency picks. Picking from one of the high scorers might improve your odds of getting more letters, but it doesn’t guarantee that you’ll get some, or even any. You might wind up with something like ‘Blind Luck’, which doesn’t contain an O or an H. These estimates can help you, but only so much.
So, after all these calculations, I now know what I would do if I ever found myself in Wheel’s final round. I’d go for H, G, and B (G and B having been arbitrarily picked over F), and then O as my vowel. And maybe I’d win big. Of course, the biggest factor in all this is your ability to manipulate letters and words in your mind. That’s one subject in which I lack skill, as evinced by the Boggle-solving program I wrote (a story for another time). So I might tank, even if my statistically-chosen letters filled out quite a bit of the puzzle. A lot of it does come down to luck, which was probably the producers’ intention all along.
Behold the Glory that is Object-Oriented programming!
Wednesday, May 7th, 2008So, for my final project in CS 365 (Databases), I’m designing a Wikipedia-style encyclopedia. Not very original, but it gets the job done. And it’s kind of fun to code.
Until today, however. Since this is finals week, I’ve been concentrating on other things until today. The project is due tomorrow. I already had a lot of it done (strange for me, I know, but I’m trying to get out of here), and had seen a lot of e-mails over the weekend about how the DB server we were using was going down and back up as it was fixed. I didn’t worry too much, because as of last night it was supposed to be up and strong.
Imagine my horror, then, as I logged on to the site to see what needed to be done, and got all sorts of errors, most of them involving the MySQL server’s refusal to connect. Some pages, mostly display pages, were still working. So I could browse to articles, list them, and so on, but I couldn’t create or edit them. I was understandably upset, because I also couldn’t implement the access controls or categorization features that my proposal said I would.
After some investigation, I determined that the problem arose when I requested more than one SQL connection at a time. In theory, I only needed one at a time, but my architecture was designed around a Database class, which you could have more than one of. One class for one connection. I also had some static classes for doing things like manipulating articles, categorizing them, linkifying them, and so on. All the edit pages would usually verify that the article in question exists, then call these static classes to do what they needed to do. So the calling page was creating a DB connection, then the static classes would, in order to do what they had to.
My options were looking pretty grim. Do I switch DB servers, and troubleshoot that nightmare? Do I change my whole engineering scheme with T-minus 20 hours and counting?
After some general freaking out, I realized that my pages revolved around a static call in the Database class called getConnection()
, which returns a connection to the default database. The wheels in my head started turning, and I realized that since every one of my requests for a database connection go through this method, I could somehow use it to save the day.
The trick was to create a static class member called $working, which held the connection to one, and only one, database. So I changed my code. From this:
public static function getConnection () { return new Database(); } |
To this:
public static function getConnection () { if (self::$working == NULL) self::$working = new Database(); return self::$working; } |
Since I essentially had a factory method to get the database connections to begin with, I merely needed to change this method so that it created the first connection, but didn’t do so on subsequent connection requests. Instead, it returns the already-existing connection. Since all my pages use this method, it means that they all use one and only one server connection. I changed the code, uploaded, and… voila! It worked perfectly.
Imagine the trouble I’d be in if I hadn’t done this to begin with. This is why I love OO programming. Because if you start with well-engineered code, then a major change like this can fix everything, and break nothing. Long live Object-Oriented Programming!
Update:I would be remiss not to mention that this is the Singleton design pattern, which my buddy and classmate Dylan pointed out I had omitted from the original post.