Tuesday, February 03, 2009

Need a supercomputer?

Need a supercomputer? This guy builds 'em himself

Astrophysicist needs supercomputers so he builds his own
By Tim Greene
Bruce Allen

Bruce Allen is perhaps the world's best do-it-yourselfer. When he needed a supercomputer to crunch the results of gravitational-wave research, he built one with his colleagues at the University of Wisconsin-Milwaukee.

That was in 1998, and since then he's built three more supercomputers, all in pursuit of actually observing gravitational waves -- they theoretically emanate from black holes orbiting each other and from exploding stars -- that have never been directly observed.

His most recent supercomputer, a cluster of 1,680 machines with four cores each, is in Hanover, Germany. Essentially, it's a 6,720-CPU core processor that in the months after it was built was ranked number 58 in the world. "We filled our last row of racks recently, and we're number 79 on the current top 500 list now," says Allen, the director of the Max Planck Institute.

He builds his own for several reasons, including that he thinks he gets more for his money when he does the work himself.

"If you go to a company -- Dell or IBM -- and you say, 'I've got a $2 million budget, what can you sell me for that price?' you'll come back with a certain number of CPUs," he says.

"If you then go and look at Pricewatch or some other place where you can find out how much the gear really costs, you find out that if you build something yourself with the same money you'll end up with two or three times the processing power."

The problem is big-name companies have a lot of overhead comprised of layers of management and engineering. "They do sell good products, and you don't need to have any particular expertise to buy them," he says. "It's always been my experience that if I do it myself I get more bang for my buck."

For instance, his first supercomputer was built from a Linux cluster of bargain 48 DEC Alpha Servers that had been discontinued, each with a single 300-MHz 64-bit AXP processor. "So I got a very good deal on them. I think the list price was $6,000 and I bought them after they were end-of-lifed for $800," Allen says. "The switch was a 3Com Superstack 100M bps Ethernet switch. I think it was a pair of them, each with 24 ports connected by a matrix cable."

The servers were housed in a room slightly larger than a closet on particle board shelves bought at Home Depot. "It wasn't even racks because rack-mounted systems would have raised the price significantly," Allen says. The whole thing used about 200 watts of power, and the university facilities staff had to remove flaps from the air ducts feeding the room so they could dissipate the heat efficiently enough.

The total cost was about $70,000 he got from the National Science Foundation (NSF). The grant was actually for eight high-end Sun workstations, but he spent it on the Linux cluster instead.

"About a year later I was giving a scientific talk about this, and the two program managers from the NSF came up to me afterwards," he says. "I sort of shamefacedly apologized. I said, 'Well, I hope you're not angry that I went ahead and did this anyway.'

"And they both laughed and said, 'Well, we're very, very happy. If it hadn't been successful, we wouldn't be saying that.' "

Another benefit of crafting his own is the control it gives. Using a shared supercomputer creates unwelcome delays, he says. At the Cal Tech center for supercomputer applications, for example, he had to batch his jobs and wait two days until it was his turn. Then if he'd made a one-character error in a submit file, for example, he'd have to redo it and his job would move to the back of the queue for another two-day wait.

There were many such possibilities for setbacks. "Each of these things was a little inefficiency factor, maybe .8 or something like that. But there were six or eight or 10 of these things and all of those factors of .8; by the time you multiply them all together it was very difficult to actually get the work done," he says.

Allen says he has no formal training in building supercomputers. Most of what he uses is Beowulf open source clustering technology that he felt his way through. "I don't think it takes particular expertise," he says. "Lots of people have set up Linux networks at home and any of those people with some money and some need for a compute cluster could build one, I think."

The most complicated thing about building a cluster is the networking, and the trickiest part of that is automating configuration of the boxes. When he started out on the 48-node cluster in 1998, he did each operation by hand on each server. "You quickly discover if it takes you five minutes per computer to do something and you have to do it 48 times an entire morning or afternoon goes by and what's more you make mistakes," he says.

"So the name of the game is setting up automated systems to do things, like automated systems for installing operating systems and cloning machines and so forth. But there's lots of public domain tools out there for doing that."

Allen says he regards the supercomputers as a tool for observing gravitational waves, which he regards as just another tool for finding out more about the universe.

"For example, orbiting pairs of black holes don't emit any light, no optical, no radio, no X-rays," Allen says. "But they do emit gravitational waves so we'll be able to study such things by their gravitational wave emission. And who knows what else we'll discover? That's really our secret hope, that we'll find something really new."

Post a Comment