Distributed Lame

The Problem
The Idea
My Solution
The Protocol
Installation
I would like to hear from You
Licensing
Download

The Problem

You have a huge number of .wav-files (or CDs/LPs that you can convert to .wav) and you have a number of machines running *IX. Usually You would take Your fastest machine and a software like lame, bladeenc or the Frauenhofersoftware to convert them to mp3's. All the other CPUs are idle and lonely \:)

The Idea

Why not write a little framework to distribute the work. And while we're at it also create a filestructure that you will find Your mp3's again. The best thing would be: "Insert an audiocd start something wait a little and soon You will find Your MP3s. Another issue is, that ripping a whole audiocd might take up to 700MB of diskspace which might be an issue for some systems.
Another issue is naming the MP3s and setting the tags right. But there is the CDDB for that. So the idea was to provide a framework that

Parallelizes the work
Optionally saves diskspace for ripping to a minimum
Minimizes the stuff You have to do by hand like naming files, calling the ripper and the encoder sorting the files

My Solution

The franework I designed consists of three components:

The Rlamecpu (remotelamecpu) at first I wrote a little bit more of a perlscript for that, which is still included, but basically it is a two-line-shellscript, that turns a machine that has inetd running to a remote-encoder You just have to install lame, and the shellscript to the machine and edit /etc/inetd.conf and /etc/services
The CPU-Broker (Rlamed.pl). This is a process running on one machine in the networks that is responsible for the management of all Rlamecpus installed in step 1. Whenever an rlameclient from point 3 wants to encode some wavfiles it asks the Rlamed for cpus, which it receives in the form ipaddress:portnumber. During work the Rlamec sends statusupdates to the daemon. After work is finished the rlamec sends a releasemessage so that the rlamecpu can be used by other rlamec's. A client can also get the rlamed to send statusupdates to it for monitoringpurposes. Finally the rlamed supports dynamically adding and removing cpus from its pool and redistributing them among clients if necassary.
The Rlameclient (Rlamec.pl) finally does the work. It contacts the cpubroker, gets the songs to work on either from the audiocd in the cdromdrive (since I have a cd-changer it actually is an array of devices) and from the cddb or from infofiles in a configurable directory. If there are no files in the directory and the actual information is fetchd via the cddb, the infofiles are created so they won't have to be fetched again if you interrupt the work. For each song it checks, if there is already a file with this Title_of_the_song.mp3 in the directory "Interpreten/Name of Singer" underneath a configurable startdirectory. If the file exists nothing happens but that there will be a softlink created in "Interpreten/Name of Singer/Alben/Name of Album/Indexofsong_Title_of_song/mp3" that points to the file. Otherwise the Rlamec looks underneath another configurable directory in a directory that is named after the cddbid (as is the infofile by the way) if track<index>.cdda.way exists. If it doesn't it rips it from the the audiocd in the correct device using cdparanoia. If ripping is continued a process is forked off to read the file and send it to the rlamecpu. If this is finished the MP3 is sent back and then put in the proper location as stated above. The signalhandler for SIGCHLD deletes the ripped .wav-file if it thinks the MP3 was received correctly and then work begins on the next song. The Rlamec also receives statusupdates from it's children. In a later release I might use that information to get rid of the slowest CPUs first. If there are less songs left to work on than there are CPUs the Rlamec. releases the CPUs it doesn't need anymore. If it receives a message that it got an additional CPU or a CPU was deleted it either distributes an undone song to the new CPU or marks the deleted CPU to be released if it has finished it's current song.

The Protocol

The Rlamed binds itself on a port that is specified in it's configfile. If a TCP-connection is created to this port it understand the following commands (they are case insensitive):

```
getcpu number
```
This requests as many cpus as are stated in number. Rlamec normally requests 5 CPUs. The answer received is either
```
WAIT
```
This means there are currently no free cpus and as soon as a one gets free it will be allocated. The other answer is
```
CPU ipaddr:port
```
There are as many lines of this format as there are free CPUs but at most as many as there were requested.
```
getstatus
```
The answer to this command is a list of all CPUs and their status. The answer looks something like this:
```
CPU 192.168.1.1:9999@192.168.1.1:1224|Albumname|Singer's name|Songindex Songtitle
```
Which means, that the CPU on 192.168.1.1:9999 is working for 192.168.1.1 (sourceport 1224 in its connection to the CPUbroker) working on albumname, from Singer's name song number songindex. You get one line per cpu. If the cpu has nothing to do and is not allocated you will just see ipaddress and portnumber followed by idle if it is allocated but idle (the rlamec is ripping for another cpu) than you will the
```
cpu:port@client:port|idle
```
if the client is currently ripping a song for this cpu you will see a message at the end "ripping index".
```
addcpu ipaddr:port
```
This adds the cpu at ipaddr:port to the pool of available CPUs. Rlamed then tries to reallocate this CPU to either a client that has no CPU's or a client that has less CPUs then it requested. This client receives a message like:
```
CPU 10.10.10.10:9999
```
and then gives it some work to do. All statusreceivers (see below) get a message like
```
CPU 10.10.10.10:9999|idle
```
```
delcpu ipaddr:port
```
This removes the CPU from the pool of available CPUs. If the CPU is allocated it is deleted as soon, as the clients that has it releases it. After this message is received by rlamed the client that has allocated gets the same message to mark the cpu for release if it has finished it's work. Afterwards the client sends a releasemessage to the rlamed. If the releasemessage is received (or the CPU was not allocated) all statusreceivers get the message forwarded.
```
releasecpu ipaddr:port
```
This message can only be sent by a client that has a cpu allocated. It is either sent, when there is no work left for a cpu or when the delete-message was sent to a client and the cpu has finished it's work.
```
sendstatusupdates
```
This makes the client that has the connection to the rlamed to a statusreceiver. Whenever the Rlamed receives a performanceupdate (see below) this client receives a message in the format like with getstatus but with three more fields which are the bytes that are already processed, the total bytes processed and the rate in bytes/second that this cpu processes.
```
perfupdate ipaddr:port|bytesfinished|size|rate
```
This message is first sent by the rlamec masterprocess to set all fields. Later on the client's children send it with the size field empty, since the client does not no the size.
```
quit
```
This quits the connection (as does closing the socket). Doing this will also release cpus the client might have allocated.

Installation

There is one module containing some utility-routines. If you want a lib and bin configuration but it in the libdirectory and either call perl with -Ilib or add a BEGIN-block. Otherwise keep all files in the same directory. You need cdparanoia and lame for this to work. You also need MP3::Info from CPAN. If You want to use the statusviewer you also need Curses.pm. Next you should set up the rlamecpus. To do this you should create an entry in /etc/services on all machines that you want to use. That entry might look like this:

lame    9999/tcp

Be sure to theck if port 9999 isn't used for something else. Next edit /etc/inetd.conf and add an entry like the following

lame    stream  tcp     nowait nobody /path/to/rlamecpu.sh rlamecpu.sh

Then install rlamecpu.sh in the directory You want it to be (whatever /path/to is in real live) and make sure, you don't forget the modebits for it. You also have to modify rlamecpu.sh if you have installed lame in another location then /usr/local/bin. Test it by doing a "telnet localhost lame" on that machine. You should see nothing. If something is wrong the connection will be terminated. You need three directories for this to work:

datadir
In this directory Rlamec.pl creates files with a filename that is the cddbid of the CD in your cdrom (might be multiple).
raw-dir
A directory where the wav-files are ripped to. In this directory there are subdirectories with the name that is the cddbid of the CD to be worked in. Inside these directories the wav-files will be placed. During work You will also see files called "output.mp3.pid". These are the temporary files for the Rlamec-children and they are removed if encoding was successfull.
mp3-dir
This is the destinationdirectory under which the file-hierarchy for the MP3s is created.

The Rlamed needs a configfile, which understands two commands

PORT=NUMBER
With this line you specify the portnumber for Rlamed to listen for new connections.
CPU=ipaddr:port
This allows you to preconfigure CPUs instead of adding them using the addcpu command (see abov). You can have multiple of these.

There are two things you might want to modify in Rlamec.pl:

cddb-config
In line 375 (at least in this version but it is in the lower part of the code) you will find hardcoded the hostname and portnumber for the CDDB-server you want to use. You might want to set this up according to your site's requirements.
DEVICES-Array
Above the CDDB-Config You will find an array called DEVICES in which you should put the names of Your cdrom device(s)

After You set everything up, we are ready to run now. First start Rlamed and test it by telnetting to the port that you specified in the config file. If you have the config-file in a different then the current directory call Rlamed with the -c option that takes the path to the configfile as argument. You might want to send the Rlamed to the background.In the telnet-session type

getstatus

You should see a list of cpus that you added to the configfile. If You see nothing type

addcpu ipaddr:port

with one of the CPUs you configured. Then type quit to close the connection.
Next put a CD in your drive, be sure to have connectivity to the cddb you have configured and also to have set up the three directories mentioned above. Then type (assuming you are in the directory where you extracted the software to)

perl Rlamec.pl -m mp3dir -d raw-dir -D data-dir -r localhost:portofrlamed

To be able to see, what it is doing (if you have installed Curses.pm) call

perl Rlamestatus.pl -r localhost:portofrlamed

this is a client-viewer. If you were fast enough the rlamec should still be ripping the first song and You should see this in the statusviewer.

Advanced usage

Rlamec.pl also understands two other options:

-c cdromdevice
With this option just the cd in the named cdromdevice is worked on. This is the intersection of the DEVICES-array and what you enter there.
-s songindex
Just work on the song with the index that is passed as argument. If this is used without -c and you have more than one device in @DEVICES you will work on all songs with index index on all devices

ripper.pl

In working with more than two fast CPUs I found, that most of the time is consumed in ripping. So there is the ripper-utility which takes -d and -D with the same meaning as above to rip a complete CD and create the infofiles needed. Be sure to have enough diskspace if you do this. Also beware of ripping and encoding in the same directories in parallel. If Your encoding overtakes the ripping you might get incomplete MP3s

Bandwidth considerations

If you are using a 10Mbit-Ethernet-connection for the machine that rips and sends out the work (the machine where rlamec is running) add up the bandwidths of your remotecpus because it won't get any faster if you satisfy your bandwidth. Also think of your colleagues if you do this at work \:)

I would like to hear from You

First of all: This software is quite beta. I think it works for me but I am quite sure it has errors. So it might stuck. If this happens kill the Rlamed and Rlamec and check for incomplete MP3s. Afterwards start again it should continue.
This documentation is really alpha and was written later at night. I might add a picture of the communcations later. I hope you get it working anyway.
Next I would be interested at the rate of 'fast' CPUs. The fastest I tested was a 600Mhz PIII, which had peaks of 250Kbyte/sec. So if you have other CPUs I would be interested in the rates you achieve there.
Another thing: Quite some time ago I heard of something called "Postcardware" which meant that instead of paying something the author of the software would like to receive postcards from the users just for the fun of getting postcards from all over the world. So if you like the software I would be happy to get some snailmail \:) Snail-Address:
Konstantin Agouros
Otkerstr. 28
D-81547 München
Germany

Licensing

This software is free of charge for use. However to use it in a commercially available product (even parts of the code!) you need to obtain a license from me. This is of course meant to create MP3s from CDs you bought. The software shall not be used to create illegal copies of MP3s.

Have fun with it
Konstantin