Lost opportunities

Somewhere along the way of taking things apart and putting them back together again I became interested in distributed systems.

It all started back in college when I was working on an independent study. I was reading about beowulf clusters and found them fascinating. They required very specific hardware though and I was a broke college student. I thought if there could be a generic way of creating a cluster with Java that might make a good independent study course for a semester. What I wanted was a way to create tasks that could be broken apart into smaller pieces, worked on independently and assembled together again for a result. You might be thinking gee, this sounds a lot like a map reduce job in hadoop! Well yes you would be correct but this project I was working on predates hadoop by at least 4 years. I was working on this in the 2001-2002 time frame. Had I known at the time I was onto something big I would’ve kept pursuing it. Such is life.

So lets talk about how this worked. Unfortunately the source code for this is property of Rider University so I can only write down what I recall from memory. Lets say you have a bunch of crappy old computers lying around that you would like to cobble together into something bigger. You would startup the agent that I wrote on each one of the nodes. These nodes would then initiate a discovery protocol to figure out who is part of the cluster. Great so now we have a bunch of random desktops that can communicate with one another. Now we need a job queuing and distribution system. That part ran on a selected leader node. This project came before I learned about robust leader election systems such as paxos and raft so at the time you just had to select a leader. The leader would take your job that you wrote and chunk it into smaller chunks to be sent to each of the participants in the cluster. It would also keep track of summing the results.

At the time I remember having at least 3 or 4 Pentium 2 or 3 systems laying around that I started up my program on. The results were amazing! I timed how long a job would take with a single core Pentium 3 desktop. I than told the program to split the job into 2 parts and have 2 systems work on it and assemble the results. To my surprise they completed the job in almost exactly half the time! I was on to something. So I decided to keep scaling it out. I added a 3rd and then a 4th computer into the cluster and it was still scaling out in a very predictable fashion.

My job specification system was too rigid so I decided to create a plugin system for this program. The plugin could take any Java jar file that implemented a specific interface, load it up and run the job.

Looking back at my work from college it is clear that had I been in silicon valley I would’ve pursued venture funding and formed a company around this idea. Hadoop today is one of the most popular distributed computing platforms in the world. Rider University can take something away from this. Take a lesson from MIT and Stanford and consider taking your students projects and starting companies around them. I sent an email to Rider asking to open source my code since it’s been ages and as far as I’m aware nothing has been done with the code. I’ll update this blog if that endeavor is successful.

Update 5-28-15: Good news! I talked to my independent study professor and he said it would be fine to open source the work. https://github.com/cholcombe973/GeneToolkit