music ninja

The Story of Music Suggestions Ninja

I have long been inspired by tools that can find new music, like Gnod’s self-adapting system of learning, Last.fm music recommendations, and the Hype Machine trending aggregator. I decided to try to make my own – a large task considering I know almost nothing about web applications. The result is the Music Suggestions Ninja, so named only because the domain was cheap.

The concept

My concept was simple. I only wanted to have a button to press to generate a playlist of music that is very closely related to a given playlist. The task would consist of several steps though, that are actually slightly complicated.

The problems

Here are the problems that I ended up needing to solve:


1) I would need a list of band names and genres.
2) The bands list needs to be clustered according to genre.
3) I would need a way to search band names that can even allow for mispelling (gotta set high standards!).
4) I would need a way to find music from the related bands in youtube
5) I would need to create a web app
6) Ideally I can host this on a Raspberry Pi.

Most of these things are slightly technical and boring and tedious, but I’ll go over some things that I thought were kind of cool and interesting and possibly (though only slight) useful to the rest of you. If there are some things I don’t go over, that you would like to know more about, please comment and let me know!

The solutions (selected)

A way to search band names that can even allow for mispelling

So given I have a list of band names, how on earth do I search for one? I want to allow for mispellings because I want to always at least try to find something. This turns out, to be pretty easy. There is a pretty reliable and fast algorithm for computing the Levenshtein distance which can essentially tell you the edit distance between two words.

There are problems with just using edit distance, though, as the edit distance can be quite misleading if the words are only just rearranged. However, someone smart has already thought of this and fixed it! The solution is to use the Python package fuzzywuzzy which can compute Levenshtein distances and account for all sorts of stuff, like rearrangements.

So I like to use two metrics, as I found they work better than one, so my string matching code is simply:

from fuzzywuzzy import fuzz
from fuzzywuzzy import process
import Levenshtein

def compareStrings(strings):
    leven1 = fuzz.token_set_ratio(strings[0],strings[1])
    leven2 = Levenshtein.ratio(str(strings[0]),str(strings[1]))
    return (strings[0],strings[1],leven1,leven2)

Now how to get the best match? Here’s another problem: their can be a lot of band names so its very slow to use a single loop. Here’s the solution: use more computer power! We can actually use the multiprocessing library to efficiently go through the band names, such as:

import operator
from multiprocessing import Pool

def printTopTenResults(searchString):

    stringList = []
    for bandName in bandNames:
        stringList.append((searchString,bandName))

    pool = Pool(5)

    results = pool.map(compareStrings, stringList)
    print (sorted(results, key=operator.itemgetter(2, 3), reverse=True))[:10]

Try it out! Combine the two and test it with a tiny list of band names:

bandNames = ['Lynyrd Skynyrd','The Sex Pistols','Van Halen','Metric','Prince','Kings of Leon','The Beatles','The Monkees','Van Morrison','Jim Morrison']
printTopTenResults('Van Morrison')

So if you search for “Van Morrison” you’ll get a top result of ('Van Morrison', 100, 1.0) followed by ('Jim Morrison', 80, 0.75), and (Van Halen', 50, 0.47619047619047616) as the respective scores (second and third in tuple) decreases. It works pretty well! I’m happy with it at least – its fast, efficient, precise, and scalable.

A web app that allows more than one person to connect

As I started looking at web apps I immedietly settled for a simple but elegant solution – using Electric Monk’s tinywapp which is a really elegant a nice piece of code. It worked too, I could easily use it for what I’m doing, but I ran into a problem. It takes about 3-10 seconds to find the band name, generate the tree, and find the YouTube URLs. If another person connected, they would have to wait. So it was limited to about 6 people per minute.

The way around this problem is to a WSGI HTTP Server that can handle multiple workers. I used gunicorn but I’m told that their are others out there. This way I can spawn multiple processes, so that several people can connect at once! I also adapted this for my poetry generator which I found had a similar problem. The code for the server and the deploying process can be found at my GitHub.

I realize I haven’t gone over all the solutions to the problems up top, but I can do that in later posts if people are interested!