So I’ve begun the process of manually merging my music collection. It’s a mess, quite frankly. I’ve got MP3s I’ve ripped or purchased on four different computers, spread throughout many directories. Compounding this is my iPod, which usually carries the latest tracks that I’ve added. Here’s how I’m organizing things. The fun part is that I got to write a Python script to help out.
First Steps
I’ve got one folder that was my primary music folder throughout my time in school. It rests on my file server. It generally contains all my music and is the most authoratative ‘source.’ In addition, it was the initial source, the ‘seed’ if you will, for the tracks on the iPod. At one point in the distant past, my iPod contained the tracks from this folder and nothing else. This is what I’m going to start with. To really drive home the point of my fresh start, I creates a share on my file server and started anew. These tracks wound up in a folder called ‘library.’
This is already a good start. I’ve been pretty meticulous in organizing my music library, essentially by artist then by album. The /library/ folder is going to be my new, massively-integrated library, as soon as I get finished organzing.
The iPod
Since my iPod contains several albums that never made it to the music share for one reason or another, it can also be considered ‘authoratative.’ So I ripped its contents to another folder in the new music shared, called /iPod/. I used the excellent tool SharePod to do this, as it allowed me to rip the tracks to artist/album folders with very little hastle.
Other Sources
I then rounded up all my other music, and put it into an ‘unsorted’ directory. This is stuff I would go through item by item, once the two primary sources were sorted out, and include or not include depending on if it wound up on my iPod or not. I have yet to get all the way through this step.
The Script
This is the important bit. I wrote a Python script to crawl through the two directories in parallel, and note any missing files or directories. This way, I’ll know what I need to copy from the /iPod/ folder to the /library/ folder. It’s a fairly simple command-line script, used like this:
compare.py left right outfile [filter1,filter2...]
left is the first directory, right is the second. outfile is a text file that the differences will be written to, and the [filter]s allow me to specify a whitelist of file types I care about. In this case, the whitelist would be restricted to audio file types. Here is the command I wound up running (drive Y:\ is the share I set up):
compare.ph Y:\library\ Y:\iPod\ Y:\results.txt mp3,m4a
This ran the Python script, comparing the /library/ and /iPod/ directories (and, recursively, their children), saving the log of all the differences to results.txt at the root of the share. Additionally, the program ignored any files except mp3 or m4a files (and directories, obviously). I wound up with a list of all the folders and files unique to the initial library and the one copied from my iPod. Then it was a simple matter to copy the iPod-unique folders to the library. I could even use it to update my iPod if I really wanted to, although it’s running pretty close to full now.
Of course, there’s still a lot of work to do: I’ve got to tag the /unsorted/ files. Have I mentioned how meticulous I am about my music library?
Source Code
import os # for files and paths import sys # for command line arguments def matches (path, fileName, filter): """Returns true if the given file matches the filter or is a directory, false otherwise. path - the directory the file resides in fileName - the name of the file in question filter - Either None to indicate no filtering should be applied, or a list of allowed extensions.""" if filter == None: return True else: # if it's a directory, return true if (os.path.isdir(os.path.join(path, fileName))): return True ext = fileName.split(".").pop() return (ext in filter) def compareDirectories (leftPath, rightPath, uniqueLeft, uniqueRight, filter = None): """Recursive function to compare the contents of two given directories. Two lists are supplied to keep track of the unique files. An optional filter is allowed. leftPath - The path to the first directory. rightPath - The path to the second directory. uniqueLeft - A master list of files unique to the left directory tree. uniqueRight - A master list of files unique to the right directory tree. filter - Either None, or a list of allowed (whitelist) extensions for files. A unique file in either the left or right directory will not be counted as unique if its extension does not match one of the filter items.""" # get contents of directories left = sorted(os.listdir(leftPath)); right = sorted(os.listdir(rightPath)); # without a filter, just add all unique files if (filter == None): # append unique files by using a list comprehension to get all files on one side # that are not on the other side uniqueLeft[len(uniqueLeft):] = [os.path.join(rightPath, fileName) for fileName in right if fileName not in left] uniqueRight[len(uniqueRight):] = [os.path.join(leftPath, fileName) for fileName in left if fileName not in right] # otherwise, use the filter function else: # same as above, but also checks to see that the files match the given filters uniqueLeft[len(uniqueLeft):] = [os.path.join(rightPath, fileName) for fileName in right if fileName not in left and matches(rightPath, fileName, filter)] uniqueRight[len(uniqueRight):] = [os.path.join(leftPath, fileName) for fileName in left if fileName not in right and matches(leftPath, fileName, filter)] # get a list of files in both directores. Since they by definition must be in both, # we can pull them from either side using a list comprehension to check that they're # in the other. both = [fileName for fileName in left if fileName in right] # now go through and recursively call the function for any directories in both parent directories for fileName in both: leftChild = os.path.join(leftPath, fileName) rightChild = os.path.join(rightPath, fileName) if (os.path.isdir(leftChild) and os.path.isdir(rightChild)): compareDirectories(leftChild, rightChild, uniqueLeft, uniqueRight, filter) def usage (): print "\n\ncompare.py" print "Compares two directories recursively and lists files or folders unique to each one.\n" print "compare.py left right outfile [filter1,filter2...]" print "\tleft\tFirst directory to compare" print "\tright\tSecond directory to compare" print "\toutfile\tText file that results are written to" print "\t[filter1,filter2]\tOptional comma-separated whitelist" print" \t\t\t\tof extensions for files" exit() if __name__ == "__main__": # slice off name of program from args args = sys.argv[1:] # if there's an incorrect number of parameters, print the usage if len(args) < 3 or len(args) > 4: usage() # set up filter whitelist, if any filter = None if len(args) == 4: filter = args[3].split(",") # set up lists of unique files on both sides uniqueRight = list(); uniqueLeft = list(); # do the comparison recursively compareDirectories(args[0], args[1], uniqueLeft, uniqueRight, filter) # write to the file out = open(args[2], 'w') out.write("UNIQUE TO LEFT:\n") for fileName in uniqueLeft: out.write(fileName + "\n") out.write("\nUNIQUE TO RIGHT:\n") for fileName in uniqueRight: out.write(fileName + "\n") out.close() |