Computing

IBM Builds Biggest Data Drive Ever

The system could enable detailed simulations of real-world phenomena—or store 24 billion MP3s.

  • Thursday, August 25, 2011
  • By Tom Simonite

A data repository almost 10 times bigger than any made before is being built by researchers at IBM's Almaden, California, research lab. The 120 petabyte "drive"—that's 120 million gigabytes—is made up of 200,000 conventional hard disk drives working together. The giant data container is expected to store around one trillion files and should provide the space needed to allow more powerful simulations of complex systems, like those used to model weather and climate.

A 120 petabyte drive could hold 24 billion typical five-megabyte MP3 files or comfortably swallow 60 copies of the biggest backup of the Web, the 150 billion pages that make up the Internet Archive's WayBack Machine.

The data storage group at IBM Almaden is developing the record-breaking storage system for an unnamed client that needs a new supercomputer for detailed simulations of real-world phenomena. However, the new technologies developed to build such a large repository could enable similar systems for more conventional commercial computing, says Bruce Hillsberg, director of storage research at IBM and leader of the project.

"This 120 petabyte system is on the lunatic fringe now, but in a few years it may be that all cloud computing systems are like it," Hillsberg says. Just keeping track of the names, types, and other attributes of the files stored in the system will consume around two petabytes of its capacity.

Steve Conway, a vice president of research with the analyst firm IDC who specializes in high-performance computing (HPC), says IBM's repository is significantly bigger than previous storage systems. "A 120-petabye storage array would easily be the largest I've encountered," he says. The largest arrays available today are about 15 petabytes in size. Supercomputing problems that could benefit from more data storage include weather forecasts, seismic processing in the petroleum industry, and molecular studies of genomes or proteins, says Conway.

IBM's engineers developed a series of new hardware and software techniques to enable such a large hike in data-storage capacity. Finding a way to efficiently combine the thousands of hard drives that the system is built from was one challenge. As in most data centers, the drives sit in horizontal drawers stacked inside tall racks. Yet IBM's researchers had to make those significantly wider than usual to fit more disks into a smaller area. The disks must be cooled with circulating water rather than standard fans.

The inevitable failures that occur regularly in such a large collection of disks present another major challenge, says Hillsberg. IBM uses the standard tactic of storing multiple copies of data on different disks, but it employs new refinements that allow a supercomputer to keep working at almost full speed even when a drive breaks down.

Related Articles

A Preview of Future Disk Drives

A prototype disk drive based on phase-change memory can outperform an off-the-shelf flash hard disk .

Hard-Drive Advance Wins the Nobel Prize

Findings transformed storage and could pave the way for new devices.

How to Kill a Hard Drive

A new portable system completely cleans magnetic disks.

Close Comments

To comment, please sign in or register

Forgot my password

ghaller

6 Comments

  • 637 Days Ago
  • 08/26/2011

usable disk space

After accounting for raid space, cluster overhead, OS overhead, and the fact that hardware vendors count differently than software systems, the available space may 'only' be half of the advertised amount.

Reply

yuevbot

27 Comments

  • 635 Days Ago
  • 08/28/2011

Nice free ad for IBM...

...but nothing earthshaking, or even mildly interesting. Water cooling -- IBM's been doing that since the late 60's. RAID -- 70's.

Reply

sentee

3 Comments

  • 635 Days Ago
  • 08/28/2011

Re: Nice free ad for IBM...

This is a lot more than just RAID. See http://public.dhe.ibm.com/common/ssi/ecm/en/dcs03005usen/DCS03005USEN.PDF for an overview of GPFS.

For enterprise storage systems these days, basic RAID is obsolete. While the fundamental concepts of data striping and data redundancy are still used, these storage subsystems don't even fit any of the RAID models.

By jumping to a size that is 10 times any previous effort, IBM could see all kinds of scaling issues and bottlenecks appear. While some of that 10x comes from bigger capacity drives, you also got a lot more drives, data paths, and metadata to worry about.

Reply

SRD75

1 Comment

  • 634 Days Ago
  • 08/29/2011

Their project customer

I'm willing to bet their customer for this project will be the Square Kilometre Array.

Reply

JagadeeshA

17 Comments

  • 626 Days Ago
  • 09/06/2011

Biggest Data Drive from IBM

IBM Builds Biggest Data Drive - fantastic. It brings in to focus the need for more data to store!

Dr.A.Jagadeesh  Nellore(AP),India
E-mail: anumakonda.jagadeesh@gmail.com

Reply

kevswag

1 Comment

  • 624 Days Ago
  • 09/08/2011

Will they need 2?

How will they backup that? :)

Reply

Advertisement

Special Reports

Innovators Under 35: India

2012 India TR35

The INDIA TR35 list recognizes outstanding innovators under the age of 35 for their continuing work in India that has the highest impact locally and globally. We highlight innovators in India whose work--spanning medicine, computing, communications, electronics, nanotechnology, and more--is changing our world. See this year's list of winners.

View All Special Reports

Advertisement
Advertisement