“How much server space do companies like Google, Amazon, or YouTube, or for that matter Hotmail and Facebook need to run their sites?” is the question I’ve been asked to answer on ABC Radio National Drive this evening.
This isn’t a simple question to answer as the details of data storage are kept secret by most online services.
Figuring out how much data is saved in computer systems is a daunting task in itself and in 2011 scientists estimated there were 295 exabytes stored on the Internet, desktop hard drives, tape backup and other systems in 2007.
An exabyte is the equivalent of 50,000 years worth of DVD video, a typical new computer comes with a terabyte hard drive so one exabyte is the equivalent of a million new computers.
The numbers when looking at this topic are so great that petabytes are probably the best way of measuring data, a thousand of these make up an exabyte. A petabyte is the equivalent to filling up the hard drives of a thousand new computers.
Given cloud computing and data centres have grown exponentially since 2007, it’s possible that number has doubled in the last five years.
In 2009 it was reported Google was planning to have ten million servers and an exabyte of information. It’s almost certain that point has been passed, particularly given the volume of data being uploaded to YouTube which alone has 72 hours worth of video uploaded every minute.
Facebook is struggling with similar growth and it’s reported that the social media service is having to rewrite its database. Last year it was reported Facebook users were uploading six billion photos a month and at the time of the float on the US stock market the company claimed to have over a 100 petabytes of photos and video.
According to one of Microsoft’s blogs, Hotmail has over a billion mailboxes and “hundreds of petabytes of data”.
For Amazon details are harder to find, in June 2012 Amazon’s founder Jeff Bezos announced their S3 cloud storage service was now hosting a billion ‘objects’. If we assume the ‘objects’ – which could be anything from a picture to a database running on Amazon’s service – have an average size of a megabyte then that’s a exabyte of storage.
The amount of storage is only one part of the equation, we have to be able to do something with the data we’ve collected so we also have to look at processing power. This comes down to the number of computer chips or CPUs – Central Processing Units – being used to crunch the information.
Probably the most impressive data cruncher of all is the Google search engine that processes phenomenal amounts of data every time somebody does a search on the web. Google have put together an infographic that illustrates how they manage to answer over a billion queries a day in an average time of less than quarter of a second.
Google is reported to own 2% of the world’s servers and they are very secretive about the numbers, estimates based on power usage in 2011 put the number of servers the company uses at around 900,000. Given Google invests about 2.5 billion US dollars a year on new data centres, it’s safe to say they have probably passed the one million mark.
How much electricity all of this equipment uses is a valid question. According to Jonathan Koomey of Stanford University, US data centres use around 2% of the nation’s power supply and globally these facilities use around 1.5%.
The numbers involved in answering the question of how much data is stored by web services are mind boggling and they are growing exponentially. One of the problems with researching a topic like this is how quickly the source data becomes outdated.
It’s easy to overlook the complexity and size of the technologies that run social media, cloud computing or web searches. Asking questions on how these services work is essential to understanding the things we now take for granted.