Distributed Lame
- The Problem
- The Idea
- My Solution
- The Protocol
- Installation
- I would like to hear from You
- Licensing
- Download
You have a huge number of .wav-files (or CDs/LPs that you can convert to .wav) and you have a number
of machines running *IX. Usually You would take Your fastest machine and a software like lame, bladeenc
or the Frauenhofersoftware to convert them to mp3's. All the other CPUs are idle and lonely \:)
Why not write a little framework to distribute the work. And while we're at it also create a filestructure
that you will find Your mp3's again. The best thing would be: "Insert an audiocd start something wait a little
and soon You will find Your MP3s. Another issue is, that ripping a whole audiocd might take up to 700MB of diskspace
which might be an issue for some systems.
Another issue is naming the MP3s and setting the tags right. But there is the CDDB for that. So the idea was to
provide a framework that
- Parallelizes the work
- Optionally saves diskspace for ripping to a minimum
- Minimizes the stuff You have to do by hand like naming files, calling the ripper and the encoder sorting the files
The franework I designed consists of three components:
- The Rlamecpu (remotelamecpu) at first I wrote a little bit more of a perlscript for that, which is still
included, but basically it is a two-line-shellscript, that turns a machine that has inetd running to a remote-encoder
You just have to install lame, and the shellscript to the machine and edit /etc/inetd.conf and /etc/services
- The CPU-Broker (Rlamed.pl). This is a process running on one machine in the networks that is responsible for
the management of all Rlamecpus installed in step 1. Whenever an rlameclient from point 3 wants to encode some
wavfiles it asks the Rlamed for cpus, which it receives in the form ipaddress:portnumber. During work the Rlamec
sends statusupdates to the daemon. After work is finished the rlamec sends a releasemessage so that
the rlamecpu can be used by other rlamec's. A client can also get the rlamed to send statusupdates to it for
monitoringpurposes. Finally the rlamed supports dynamically adding and removing cpus from its pool and redistributing
them among clients if necassary.
- The Rlameclient (Rlamec.pl) finally does the work. It contacts the cpubroker, gets the songs to work on
either from the audiocd in the cdromdrive (since I have a cd-changer it actually is an array of devices) and from the
cddb or from infofiles in a configurable directory. If there are no files in the directory and the actual information
is fetchd via the cddb, the infofiles are created so they won't have to be fetched again if you interrupt the work.
For each song it checks, if there is already a file with this Title_of_the_song.mp3 in the directory
"Interpreten/Name of Singer" underneath a configurable startdirectory. If the file exists nothing happens but that
there will be a softlink created in "Interpreten/Name of Singer/Alben/Name of Album/Indexofsong_Title_of_song/mp3"
that points to the file. Otherwise the Rlamec looks underneath another configurable directory in a directory that
is named after the cddbid (as is the infofile by the way) if track<index>.cdda.way exists. If it doesn't it
rips it from the the audiocd in the correct device using cdparanoia. If ripping is continued a process is forked
off to read the file and send it to the rlamecpu. If this is finished the MP3 is sent back and then put in the proper
location as stated above. The signalhandler for SIGCHLD deletes the ripped .wav-file if it thinks the MP3 was
received correctly and then work begins on the next song. The Rlamec also receives statusupdates from it's children.
In a later release I might use that information to get rid of the slowest CPUs first. If there are less songs left
to work on than there are CPUs the Rlamec. releases the CPUs it doesn't need anymore. If it receives a message
that it got an additional CPU or a CPU was deleted it either distributes an undone song to the new CPU or marks
the deleted CPU to be released if it has finished it's current song.
The Rlamed binds itself on a port that is specified in it's configfile. If a TCP-connection is created to this port
it understand the following commands (they are case insensitive):
getcpu number
This requests as many cpus as are stated in number. Rlamec normally requests 5 CPUs. The answer received is either
WAIT
This means there are currently no free cpus and as soon as a one gets free it will be allocated. The other answer is
CPU ipaddr:port
There are as many lines of this format as there are free CPUs but at most as many as there were requested.
getstatus
The answer to this command is a list of all CPUs and their status. The answer looks something like this:
CPU 192.168.1.1:9999@192.168.1.1:1224|Albumname|Singer's name|Songindex Songtitle
Which means, that the CPU on 192.168.1.1:9999 is working for 192.168.1.1 (sourceport 1224 in its connection to the
CPUbroker) working on albumname, from Singer's name song number songindex. You get one line per cpu. If the cpu has
nothing to do and is not allocated you will just see ipaddress and portnumber followed by idle if it is allocated
but idle (the rlamec is ripping for another cpu) than you will the
cpu:port@client:port|idle
if the client is currently ripping a song for this cpu you will see a message
at the end "ripping index".
addcpu ipaddr:port
This adds the cpu at ipaddr:port to the pool of available CPUs. Rlamed then tries to reallocate this CPU to either
a client that has no CPU's or a client that has less CPUs then it requested. This client receives a message like:
CPU 10.10.10.10:9999
and then gives it some work to do. All statusreceivers (see below) get a message
like CPU 10.10.10.10:9999|idle
delcpu ipaddr:port
This removes the CPU from the pool of available CPUs. If the CPU is allocated it is deleted as soon, as the clients
that has it releases it. After this message is received by rlamed the client that has allocated gets the same message
to mark the cpu for release if it has finished it's work. Afterwards the client sends a releasemessage to the rlamed.
If the releasemessage is received (or the CPU was not allocated) all statusreceivers get the message forwarded.
releasecpu ipaddr:port
This message can only be sent by a client that has a cpu allocated. It is either sent, when there is no work left
for a cpu or when the delete-message was sent to a client and the cpu has finished it's work.
sendstatusupdates
This makes the client that has the connection to the rlamed to a statusreceiver. Whenever the Rlamed receives a
performanceupdate (see below) this client receives a message in the format like with getstatus but with three more
fields which are the bytes that are already processed, the total bytes processed and the rate in bytes/second that
this cpu processes.
perfupdate ipaddr:port|bytesfinished|size|rate
This message is first sent by the rlamec masterprocess to set all fields. Later on the client's children send it with
the size field empty, since the client does not no the size.
quit
This quits the connection (as does closing the socket). Doing this will also release cpus the client might have allocated.
There is one module containing some utility-routines. If you want a lib and bin configuration but it in the
libdirectory and either call perl with -Ilib or add a BEGIN-block. Otherwise keep all files in the same directory.
You need cdparanoia and
lame for this to work. You also need MP3::Info from CPAN. If You want to
use the statusviewer you also need Curses.pm. Next you should set up the rlamecpus. To do this
you should create an entry in /etc/services on all machines that you want to use. That entry might look like this:
lame 9999/tcp
Be sure to theck if port 9999 isn't used for something else. Next edit /etc/inetd.conf and add an entry like the
following
lame stream tcp nowait nobody /path/to/rlamecpu.sh rlamecpu.sh
Then install rlamecpu.sh in the directory You want it to be (whatever /path/to is in real live) and make sure, you
don't forget the modebits for it. You also have to modify rlamecpu.sh if you have installed lame in another location
then /usr/local/bin. Test it by doing a "telnet localhost lame" on that machine. You should see nothing. If something
is wrong the connection will be terminated.
You need three directories for this to work:
- datadir
In this directory Rlamec.pl creates files with a filename that is the cddbid of the CD in your cdrom (might be
multiple).
- raw-dir
A directory where the wav-files are ripped to. In this directory there are subdirectories with the name that is
the cddbid of the CD to be worked in. Inside these directories the wav-files will be placed. During work You
will also see files called "output.mp3.pid". These are the temporary files for the Rlamec-children and they
are removed if encoding was successfull.
- mp3-dir
This is the destinationdirectory under which the file-hierarchy for the MP3s is created.
The Rlamed needs a configfile, which understands two commands
- PORT=NUMBER
With this line you specify the portnumber for Rlamed to listen for new connections.
- CPU=ipaddr:port
This allows you to preconfigure CPUs instead of adding them using the addcpu command (see abov). You can have
multiple of these.
There are two things you might want to modify in Rlamec.pl:
- cddb-config
In line 375 (at least in this version but it is in the lower part of the code) you will find hardcoded the hostname
and portnumber for the CDDB-server you want to use. You might want to set this up according to your site's
requirements.
- DEVICES-Array
Above the CDDB-Config You will find an array called DEVICES in which you should put the names of Your cdrom
device(s)
After You set everything up, we are ready to run now. First start Rlamed and test it by telnetting to the port
that you specified in the config file. If you have the config-file in a different then the current directory call
Rlamed with the -c option that takes the path to the configfile as argument. You might want to send the Rlamed to
the background.In the telnet-session type
getstatus
You should see a list of cpus that you added to the configfile. If You see nothing type
addcpu ipaddr:port
with one of the CPUs you configured. Then type quit to close the connection.
Next put a CD in your drive, be sure to have connectivity to the cddb you have configured and also to have set up
the three directories mentioned above. Then type (assuming you are in the directory where you extracted the software
to)
perl Rlamec.pl -m mp3dir -d raw-dir -D data-dir -r localhost:portofrlamed
To be able to see, what it is doing (if you have installed Curses.pm) call
perl Rlamestatus.pl -r localhost:portofrlamed
this is a client-viewer. If you were fast enough the rlamec
should still be ripping the first song and You should see this in the statusviewer.
Advanced usage
Rlamec.pl also understands two other options:
- -c cdromdevice
With this option just the cd in the named cdromdevice is worked on. This is the intersection of the DEVICES-array and
what you enter there.
- -s songindex
Just work on the song with the index that is passed as argument. If this is used without -c and you have
more than one device in @DEVICES you will work on all songs with index index on all devices
ripper.pl
In working with more than two fast CPUs I found, that most of the time is consumed in ripping. So there is the
ripper-utility which takes -d and -D with the same meaning as above to rip a complete CD and create the infofiles
needed. Be sure to have enough diskspace if you do this. Also beware of ripping and encoding in the same directories
in parallel. If Your encoding overtakes the ripping you might get incomplete MP3s
Bandwidth considerations
If you are using a 10Mbit-Ethernet-connection for the machine that rips and sends out the work (the machine where
rlamec is running) add up the bandwidths of your remotecpus because it won't get any faster if you satisfy your
bandwidth. Also think of your colleagues if you do this at work \:)
First of all: This software is quite beta. I think it works for me but I am quite sure it has errors. So it might
stuck. If this happens kill the Rlamed and Rlamec and check for incomplete MP3s. Afterwards start again it should
continue.
This documentation is really alpha and was written later at night. I might add a picture of the communcations later.
I hope you get it working anyway.
Next I would be interested at the rate of 'fast' CPUs. The fastest I tested was a 600Mhz PIII, which had peaks of
250Kbyte/sec. So if you have other CPUs I would be interested in the rates you achieve there.
Another thing: Quite some time ago I heard of something called "Postcardware" which meant that instead of paying
something the author of the software would like to receive postcards from the users just for the fun of getting
postcards from all over the world. So if you like the software I would be happy to get some snailmail \:)
Snail-Address:
Konstantin Agouros
Otkerstr. 28
D-81547 München
Germany
This software is free of charge for use. However to use it in a commercially available product (even parts of the
code!) you need to obtain a license from me. This is of course meant to create MP3s from CDs you bought. The software
shall not be used to create illegal copies of MP3s.
Have fun with it
Konstantin