We had an Hadoop presentation in the Emerging Technology track at CEC 2008. While it gave an good insight of the technology to the uninitiated, it left out one important thing: How to use to sell it. I talked to some colleagues, who were a little bit perplexed about the use cases in the real world.
Of course itīs a technology that found itīs birth in the web, but when you really think about it, more use cases will come into your mind. Itīs about processing data with simple means.
The customers have already the resources to build a hadoop cluster. Often they have dozens of servers, not all are equally loaded. So: Startup the Fair Share Scheduler of Solaris, give the hadoop 10% of your system and 90% for itīs real application. You have already the storage for the hadoop file system. An Solaris or Linux takes 5-10 GB at max, but you have 73 to 146 GB disks in your systems. Thatīs perfect ...
What are matching loads for such an construct? Everybody has such loads. For example scanning for logfiles for certain patterns (think about getting situational awareness about the data from all your intrusion detection sensors around your network). Or think about optical character recognition of scanned quotes, bills and receipts. You have to do an an mass convert job (old format of your scanned paper to a new one). Letīs use already existent resources for it at first. Processing all this process data from your automated test tools. Well ... implement this process in Java or
Pig and let it run on you hadoop cluster. Analysing database informations by a sequence of hadoop jobs to find patterns. And thats just the ideas iīve got in a few mintes. And itīs really easy to write such jobs with
Pig
There are really reasons to do such stuff in a distributed manner. Instead of hours you could process a multiple TB logfile within minutes by distributing the job to many nodes. And, dear colleagues, when the admins and developers of a company get more and more fluent to work with such an environment like hadoop, they doesnīt want to use the residual computing time of existing servers, they want new servers
Comments
Wed, 19.11.2008 11:32
"Will it blend" XXL
Wed, 19.11.2008 11:29
I certainly hope that Boeing a re better at making planes tha n flash video players. I notic ed when watching the vid [...]
Mon, 17.11.2008 13:12
Naja .... Swissair schon ... d er Schuppen heisst ja heute au ch SWISS und gehoert heute der Lufthansa ...
Mon, 17.11.2008 13:11
What are the constraints ? The re is a mail-address on the ri ght side ...
Mon, 17.11.2008 12:44
Hello Jeorge I was reading on e of your posts about RBAC in solaris. I am working on RBAC, How I can ask you some [...]