Wednesday, November 21, 2012

Java 7 SE Raspberry PI Parallel Processing ARM Cluster of 32 boards

Raspberry PI cluster of 32 boards - Under Construction:

Disclaimer: For serious computational power - build your own i7-5820 and use CUDA on an nVidia GTX-970.  The raspberry PI cluster is more of a "build it - and they will come" exploration exercise.

http://eclipsejpa.blogspot.ca/2015/03/haswell-intel-i7-5820k-overclock-pc.html

This article is an ongoing discussion of how to get a cluster of 32 raspberry pi boards up and running.

32 node/board Raspberry PI cluster for parallel processing experimentation

The ARM based Raspberry PI board is an excellent platform to investigate various parallel processing configurations.  If we are looking for pure performance then I would stick with an Intel core i7 and a CUDA based NVidia GPU because a single core raspberry PI is about 40 times slower than a single core of an 2nd gen i7-2600 or 3rd gen i7-3610 (about 140 times slower than an 8-thread ForkJoin implementation).  This however is not our goal, we need an efficient and accessible way to run multiple servers - and the pi does this at about $70 per server (board + connectors + 16GB SD) and 4 watts/node.  For example: it would take $9800 of raspberry PI boards with 70G ram to equal one 3rd gen i7 at $1300 with 24G ram.  But we can build a cluster of 8 raspberry PI servers with $560 as opposed to 8 i7 boxes with $10000.

 We need the proper power supplies and network switches for a cluster of 32 raspberry pi boards.
 Raspberry pi boards mount very nicely on standard breadboards using properly bent arduiono headers.


Raspberry PI Cluster
8 board Raspberry PI cluster
In this configuration I am running a research cluster of 8 raspberry PI boards to run distributed Java EE RMI/EJB remote session bean clients of a central Oracle WebLogic 12c server (running on an i7 host)


This tutorial details how to get a networked cluster (bramble) of (eight for now) Raspberry PI boards running as a single distributed auxillary processing unit to a controlling Java EE server - ideally using Hadoop.  The primary goal of this exercise is for distributed experimentation.  As I configure and acquire multiple raspberry pi boards and work out power distribution issues my cluster will increase in size.  I currently work with 8 boards and 8 spares.  The cluster of raspberry PI s can be distributed work using a custom RPC API like remote stateless sessions beans on top of RMI or they can use a formal MapReduce implementation like Hadoop or even MPI.

After running the Oracle embedded ARM JVM with no problems on the REV A raspberry PI using the distribution from Element14, I was not immediately successfull running the JVM on the new REV B (512Mb) version because the default Debian distribution from Element 14 no longer uses the soft float version.  I get the following missing library error.

pi@raspberrypi ~/java/ejre1.7.0_06 $ java -version
java: error while loading shared libraries: libjli.so: cannot open shared object file: No such file or directory

Download a new OS compatible with the Java 7 JDK here
http://www.raspberrypi.org/downloads

Choose "Soft-float Debian “wheezy”" = 2012-08-08-wheezy-armel.zip

Write it to your SD card
https://launchpad.net/win32-image-writer/+download

Reinstall Java (curently 1.7.0_10 from Oracle)

You are good to go [Java 7 SE on the Raspberry PI Rev B].  So we can now use Fork-Join, JAXB and JAX-WS webservices (using a single thread however).

DI 1: Powering your Raspberry PI cluster

On some routers you will not get a DHCP assigned address if all the clustered raspberry PI boards are powered up in sync - you will need to stagger the powerup - this only occurs if not enough amps are available.

Using a good Agilent power supply we use from 3 (idle) to 3.5A (startup) (6-7W @ 5V) for 8 boards.


You can use a powered USB router for 4 boards, but 8 will require a better power supply like a bench one from Agilent. A bench supply will usually supply up to 40A power  - but normally 5A which is good for up to 12 Raspberry PI's running at 100% CPU but we will need a better supply for a cluster of 32 raspberry pi boards for example.
A good ATX power supply will suffice to power a cluster of raspberry pi boards.  In this example I have a 450W supply which supplies 30A of 5V power (make sure you put some load on the 5V and 12V rails as well).

Get the ATX adapter and breakout board from SparkFun and make sure you use multiple 24 guage or higher wires to distribute the load (1 wire will overheat, 2 wires go to 28 deg C. - use at least 4 if you go over 8 boards).



As you can see I have yet to fully integrate the power supply interface between the ATX supply and the breadboard bus for the 8 pi's - but we are functioning fine and are no longer limited by the bench supplies or individual 5V USB connectors.  (the blue LED boards are Parallax Propeller 8-core microcontrollers uses a per/core output indicator for now.
I need some sort of protection fuse - in case of a short circuit.  It was very stressful connecting up my 8 raspberry pi boards up to the ATX after testing on one.  I recommend working with all GPIO pins (one 1 header is populated instead of 2 on the latest rev B board) covered by a flat cable connector.

20130126:  I now have 24 of 32 raspberry pi boards powered up however running the full peak 15A off an ATX power supply is not practical as it requires some serious wire guage as my 3 wire 24 guage setup is overheating.  Also if you accidently short the power supply you will use the full 15-40A and burn your wire.  I accidentally shorted the leads on a 5A supply on my metal breadboard and the supply wire started to smell and melt.  This brings us to the recommend way to power a large cluster of raspberry pi boards - separate bench power supplies.  When I shorted the supply the bench supply held stead at 5.2A which is safe enough not to burn your house down before you notice it.

 

Recommended power supply setup for 32 raspberry pi board cluster

No more than 8 raspberry pi boards per 5A power supply will allow you to add some peripherals like an adafruit display or a propeller 8-core coprocessor on an SPI bus.
So this is kind of expensive but instead of using a 40A bench supply at around $350 I use 4 separate 5A bench supplies (3 Circuit-Test PSC-520 supplies @ 3 x $225 and 1 Agilent U8002A supply @ $450).

DI 2: Updating your board for 512mb RAM (470Mb from 224Mb)

The Rev 2 board has double the ram but will require updated firmware to enable it.
https://github.com/Hexxeh/rpi-update
sudo wget http://goo.gl/1BOfJ -O /usr/bin/rpi-update && sudo chmod +x /usr/bin/rpi-update
sudo apt-get install git-core
sudo rpi-update
- reboot after firmware update

DI 3: Overclocking

The lan chip heats up to 52 degrees celsius from a normal 45 when the raspberry pi is overclocked from 700 to 800 MHz.



DI 4: Setup Networking

Wireless is kind of unreliable, I recommend wired.
The WiPi module from Element 14 works essentially out of the box

Wired networking

After duplicating all the 32 SD cards, put one at a time into one of the raspberry pi boards and change the hostname, hosts and static network interfaces settings

sudo nano /etc/hostname

sudo nano /etc/hosts


sudo nano /etc/network/interfaces
iface eth0 inet static
address 192.168.4.101
netmask 255.255.255.0
gateway 192.168.4.1
nameserver 4.2.2.1

# here we do not rely on our internet providers's DNS servers - we use the google server at 4.2.2.1 as more reliable DNS server

sudo nano /etc/resolv.conf
nameserver 4.2.2.1

DI 5: Setup Java



Setup Tomcat
login in to the manager app using "system:raspberry"




Setup Fortran and MPICH


Issue is that I lower performance (likely network overhead) when I increase the number of nodes (currently 6 pi's)

pi@rpi0 ~ $ mpiexec -f machinefile -n 1 ~/mpich_build/examples/cpi
Process 0 of 1 is on rpi0
pi is approximately 3.1415926544231341, Error is 0.0000000008333410
wall clock time = 0.017286

pi@rpi0 ~ $ mpiexec -f machinefile -n 2 ~/mpich_build/examples/cpi
Process 0 of 2 is on rpi0
Process 1 of 2 is on rpi1
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.020435

pi@rpi0 ~ $ mpiexec -f machinefile -n 4 ~/mpich_build/examples/cpi
Process 1 of 4 is on rpi1
Process 0 of 4 is on rpi0
Process 2 of 4 is on rpi2
Process 3 of 4 is on rpi3
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.037727

pi@rpi0 ~ $ mpiexec -f machinefile -n 6 ~/mpich_build/examples/cpi
Process 2 of 6 is on rpi0
Process 1 of 6 is on rpi1
Process 0 of 6 is on rpi2
Process 3 of 6 is on rpi3
Process 4 of 6 is on rpi4
Process 5 of 6 is on rpi5
pi is approximately 3.1415926544231239, Error is 0.0000000008333307
wall clock time = 0.043331

Log:
20121121: Setup 4 networked PIs
20130127: power up of 24 raspberry pi boards

BOM:

32 Raspberry PI boards from Element 14 @ $35 = $1120
32 Sandisk Ultra 16GB SD cards @ $10-18 = $320-576
0 micro USB cables = $0
1 HDMI cable from Apple = $20
8 power supply cables @ 10 = $80
4 bench 5A power supplies from Agilent or Circuit-Test @224-450 = $896-1800
4 large breadboards (that fit 8 raspberry pi boards) @ 45 = $180
64 bendable arduino headers from www.evilmadscience.com @ $1 = $64
5 Gigabit 8 node network hubs or 2 16 node hubs from Dlink @ 65 = $325
32 belkin flexible network cables from the Apple store @ 15 = $480

Total = $3885.00

Copies of this article
http://www.framboise314.fr/32-raspberry-pi-pour-du-calcul-parallele-en-java/

Links


Total Pageviews

Followers