Internet.com, IW Online, Infrastructure News, February 8, 1999
Coping With an Infinity of Transactions By Brian Caulfield Over the past year, Mike Wilson built a system for busy auction site eBay that allows thousands of customers to buy and sell online at once. Now he's trying to make sure the system never fails--but first he's got to find names for all his servers. Wilson, eBay's vice president of product development and site operations, said names are his biggest problem. All of eBay's servers are named after reptiles. "We use snakes and lizards because they're scaleable," he joked. The quip touches on Wilson's biggest challenge. As the number of users climbs, the task of supporting the growing transaction volume grows more complex. At the same time, every outage--however brief--gets more and more attention. To handle the growth and boost reliability, the firm has transformed itself from a site running on the free Unix variant BSD and two Intel boxes (when Wilson arrived two years ago) to one built around two $1.4 million Sun Enterprise 10000 Starfire servers (dubbed Python and Anaconda)--supplemented by some 47 Compaq machines running Windows NT. EBay has also bought an encyclopedia of herpetology to find names for all those servers. "If you go to another commerce site--to look at a book, say--and something happens, you can come back later," Wilson said. "The book won't go away; it will still be there when you return. Well, at eBay time is money. If we have a glitch or something, that auction has lost time," he said, adding that eBay's policy is to extend any auctions that are interrupted by outages.
EBay founder and chairman Pierre Omidyar pioneered the person-to-person auction format with a site he originally built to trade Pez dispensers. Since then it has been a hit with investors and users alike, with a valuation of more than $8.6 billion in the wake of last fall's initial public offering. Wilson said that out of the several hundred million page views the site logs each month, about half generate some kind of transaction, such as listing an item for sale, making a bid, investigating a seller's history, or searching for an item. Wilson compares eBay's complexity to that of an airline reservations system. "Except that every day well over half a million people pick seats on our 'flights.' Plus, you're allowed to search--say for an old college buddy--on any flight at any time; you can even search for feedback on the person who sat next to you on the last flight to see how they did. And on top of all that, instead of being accessible only by travel agents and people at the airport, it's accessible to anyone on the Internet," he said. "It's just a massive thing." Perhaps inevitably, there have been problems. EBay was plagued by an unusual number of outages last December--though company executives said none of them were caused by capacity problems. The first, on Dec. 7, lasted 9 hours and 40 minutes, having been caused by a software glitch. The second, on Dec. 9, lasted 8 hours and 8 minutes and was also the result of a software glitch. A third, on Dec. 18, lasted 3 hours and 11 minutes and was caused by the failure of a chip in one of eBay's Sun machines, which caused it to reboot, according to Wilson.
Finally, a second hardware-related outage occurred on Dec. 24 and lasted 50 minutes. To solve the problem, Wilson bought a Sun Enterprise 6500 to use as a warm backup machine. Wilson said he won't be sure that this has solved the problem until he conducts more tests.
Another problem has been finding a way to let users search the site even as eBay's content churns continuously. "We've got a search engine that not only has to do millions of searches a day, but has to be able to withstand the business of having to update the site every hour," Wilson said. "We had search vendors come in and tell us they had a great product, and we'd point a little of our load at it and it would melt into a puddle of metal on the floor."
Wilson said the only survivor was Texis a relational database that specializes in managing and searching full text, developed by Thunderstone Software. Texis is run full time on one of eBay's Sun Enterprise 10000 machines.
Transactions are handled by eBay's other Enterprise 10000 server, which runs an Oracle database. Storage is handled by an array of Sun D-1000 RAID machines. Compaq Proliant machines running Microsoft Internet Information Server (IIS) give users access to ISAPI applications, listing servers, page servers, and image servers. Resonate Inc.'s Central Dispatch is used to balance the load between the machines.
To get access to bandwidth, eBay houses its servers at a San Jose, Calif., facility run by Exodus Communications. EBay gets 200 Mbps of bandwidth from Exodus--which aggregates bandwidth from ISPs and resells it to its customers--in addition to DS-3 links it leases directly from Savvis, Internap, Uunet, and Sprint. Wilson reports that eBay's bandwidth utilization is in excess of 100 Mbps.
In the long term, Wilson--citing Silicon Valley's history of earthquakes--is considering using mirroring to expand eBay's capacity and to abet disaster recovery.
"Unfortunately, we can't use traditional mirroring techniques, since the site is not static," he said. "We're going to have to do some interesting and aggressive things to make that work. But we can't talk about that now."
C Mecklermedia Corporation.
For Further Information:
Movies, Museums, ...