Distributed Lame

The Problem
The Idea
My Solution
The Protocol
Installation
I would like to hear from You
Licensing
Download

The Problem

You have a huge number of .wav-files (or CDs/LPs that you can convert to .wav) and you have a number of machines running *IX. Usually You would take Your fastest machine and a software like lame, bladeenc or the Frauenhofersoftware to convert them to mp3's. All the other CPUs are idle and lonely \:)

The Idea

Why not write a little framework to distribute the work. And while we're at it also create a filestructure that you will find Your mp3's again. The best thing would be: "Insert an audiocd start something wait a little and soon You will find Your MP3s. Another issue is, that ripping a whole audiocd might take up to 700MB of diskspace which might be an issue for some systems.
Another issue is naming the MP3s and setting the tags right. But there is the CDDB for that. So the idea was to provide a framework that

My Solution

The franework I designed consists of three components:
  1. The Rlamecpu (remotelamecpu) at first I wrote a little bit more of a perlscript for that, which is still included, but basically it is a two-line-shellscript, that turns a machine that has inetd running to a remote-encoder You just have to install lame, and the shellscript to the machine and edit /etc/inetd.conf and /etc/services
  2. The CPU-Broker (Rlamed.pl). This is a process running on one machine in the networks that is responsible for the management of all Rlamecpus installed in step 1. Whenever an rlameclient from point 3 wants to encode some wavfiles it asks the Rlamed for cpus, which it receives in the form ipaddress:portnumber. During work the Rlamec sends statusupdates to the daemon. After work is finished the rlamec sends a releasemessage so that the rlamecpu can be used by other rlamec's. A client can also get the rlamed to send statusupdates to it for monitoringpurposes. Finally the rlamed supports dynamically adding and removing cpus from its pool and redistributing them among clients if necassary.
  3. The Rlameclient (Rlamec.pl) finally does the work. It contacts the cpubroker, gets the songs to work on either from the audiocd in the cdromdrive (since I have a cd-changer it actually is an array of devices) and from the cddb or from infofiles in a configurable directory. If there are no files in the directory and the actual information is fetchd via the cddb, the infofiles are created so they won't have to be fetched again if you interrupt the work. For each song it checks, if there is already a file with this Title_of_the_song.mp3 in the directory "Interpreten/Name of Singer" underneath a configurable startdirectory. If the file exists nothing happens but that there will be a softlink created in "Interpreten/Name of Singer/Alben/Name of Album/Indexofsong_Title_of_song/mp3" that points to the file. Otherwise the Rlamec looks underneath another configurable directory in a directory that is named after the cddbid (as is the infofile by the way) if track<index>.cdda.way exists. If it doesn't it rips it from the the audiocd in the correct device using cdparanoia. If ripping is continued a process is forked off to read the file and send it to the rlamecpu. If this is finished the MP3 is sent back and then put in the proper location as stated above. The signalhandler for SIGCHLD deletes the ripped .wav-file if it thinks the MP3 was received correctly and then work begins on the next song. The Rlamec also receives statusupdates from it's children. In a later release I might use that information to get rid of the slowest CPUs first. If there are less songs left to work on than there are CPUs the Rlamec. releases the CPUs it doesn't need anymore. If it receives a message that it got an additional CPU or a CPU was deleted it either distributes an undone song to the new CPU or marks the deleted CPU to be released if it has finished it's current song.

The Protocol

The Rlamed binds itself on a port that is specified in it's configfile. If a TCP-connection is created to this port it understand the following commands (they are case insensitive):

Installation

There is one module containing some utility-routines. If you want a lib and bin configuration but it in the libdirectory and either call perl with -Ilib or add a BEGIN-block. Otherwise keep all files in the same directory. You need cdparanoia and lame for this to work. You also need MP3::Info from CPAN. If You want to use the statusviewer you also need Curses.pm. Next you should set up the rlamecpus. To do this you should create an entry in /etc/services on all machines that you want to use. That entry might look like this:
lame    9999/tcp
Be sure to theck if port 9999 isn't used for something else. Next edit /etc/inetd.conf and add an entry like the following
lame    stream  tcp     nowait nobody /path/to/rlamecpu.sh rlamecpu.sh
Then install rlamecpu.sh in the directory You want it to be (whatever /path/to is in real live) and make sure, you don't forget the modebits for it. You also have to modify rlamecpu.sh if you have installed lame in another location then /usr/local/bin. Test it by doing a "telnet localhost lame" on that machine. You should see nothing. If something is wrong the connection will be terminated. You need three directories for this to work: The Rlamed needs a configfile, which understands two commands There are two things you might want to modify in Rlamec.pl: After You set everything up, we are ready to run now. First start Rlamed and test it by telnetting to the port that you specified in the config file. If you have the config-file in a different then the current directory call Rlamed with the -c option that takes the path to the configfile as argument. You might want to send the Rlamed to the background.In the telnet-session type
getstatus
You should see a list of cpus that you added to the configfile. If You see nothing type
addcpu ipaddr:port
with one of the CPUs you configured. Then type quit to close the connection.
Next put a CD in your drive, be sure to have connectivity to the cddb you have configured and also to have set up the three directories mentioned above. Then type (assuming you are in the directory where you extracted the software to)
perl Rlamec.pl -m mp3dir -d raw-dir -D data-dir -r localhost:portofrlamed
To be able to see, what it is doing (if you have installed Curses.pm) call
perl Rlamestatus.pl -r localhost:portofrlamed
this is a client-viewer. If you were fast enough the rlamec should still be ripping the first song and You should see this in the statusviewer.

Advanced usage

Rlamec.pl also understands two other options:
ripper.pl
In working with more than two fast CPUs I found, that most of the time is consumed in ripping. So there is the ripper-utility which takes -d and -D with the same meaning as above to rip a complete CD and create the infofiles needed. Be sure to have enough diskspace if you do this. Also beware of ripping and encoding in the same directories in parallel. If Your encoding overtakes the ripping you might get incomplete MP3s

Bandwidth considerations

If you are using a 10Mbit-Ethernet-connection for the machine that rips and sends out the work (the machine where rlamec is running) add up the bandwidths of your remotecpus because it won't get any faster if you satisfy your bandwidth. Also think of your colleagues if you do this at work \:)

I would like to hear from You

First of all: This software is quite beta. I think it works for me but I am quite sure it has errors. So it might stuck. If this happens kill the Rlamed and Rlamec and check for incomplete MP3s. Afterwards start again it should continue.
This documentation is really alpha and was written later at night. I might add a picture of the communcations later. I hope you get it working anyway.
Next I would be interested at the rate of 'fast' CPUs. The fastest I tested was a 600Mhz PIII, which had peaks of 250Kbyte/sec. So if you have other CPUs I would be interested in the rates you achieve there.
Another thing: Quite some time ago I heard of something called "Postcardware" which meant that instead of paying something the author of the software would like to receive postcards from the users just for the fun of getting postcards from all over the world. So if you like the software I would be happy to get some snailmail \:) Snail-Address:
Konstantin Agouros
Otkerstr. 28
D-81547 München
Germany

Licensing

This software is free of charge for use. However to use it in a commercially available product (even parts of the code!) you need to obtain a license from me. This is of course meant to create MP3s from CDs you bought. The software shall not be used to create illegal copies of MP3s.
Have fun with it
Konstantin