Hi ,
Has anyone used Hadoop and splunk, or any other real-time processing tool over Hadoop? Regards, Shreya This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. |
I'm playing with using Hadoop and Pig to load MongoDB with data for Cube to
consume. Cube <https://github.com/square/cube/wiki> is a realtime tool... but we'll be replaying events from the past. Does that count? It is nice to batch backfill metrics into 'real-time' systems in bulk. On Fri, May 18, 2012 at 12:11 PM, <[hidden email]> wrote: > Hi , > > Has anyone used Hadoop and splunk, or any other real-time processing tool > over Hadoop? > > Regards, > Shreya > > > > This e-mail and any files transmitted with it are for the sole use of the > intended recipient(s) and may contain confidential and privileged > information. If you are not the intended recipient(s), please reply to the > sender and destroy all copies of the original message. Any unauthorized > review, use, disclosure, dissemination, forwarding, printing or copying of > this email, and/or any action taken in reliance on the contents of this > e-mail is strictly prohibited and may be unlawful. > Russell Jurney twitter.com/rjurney [hidden email] datasyndrome.com |
Why not Hbase with Hadoop?
It's a best bet. Rgds, Ravi Sent from my Beethoven On May 18, 2012, at 3:29 PM, Russell Jurney <[hidden email]> wrote: > I'm playing with using Hadoop and Pig to load MongoDB with data for Cube to > consume. Cube <https://github.com/square/cube/wiki> is a realtime tool... > but we'll be replaying events from the past. Does that count? It is nice > to batch backfill metrics into 'real-time' systems in bulk. > > On Fri, May 18, 2012 at 12:11 PM, <[hidden email]> wrote: > >> Hi , >> >> Has anyone used Hadoop and splunk, or any other real-time processing tool >> over Hadoop? >> >> Regards, >> Shreya >> >> >> >> This e-mail and any files transmitted with it are for the sole use of the >> intended recipient(s) and may contain confidential and privileged >> information. If you are not the intended recipient(s), please reply to the >> sender and destroy all copies of the original message. Any unauthorized >> review, use, disclosure, dissemination, forwarding, printing or copying of >> this email, and/or any action taken in reliance on the contents of this >> e-mail is strictly prohibited and may be unlawful. >> > > Russell Jurney twitter.com/rjurney [hidden email] datasyndrome.com |
Because that isn't Cube.
Russell Jurney twitter.com/rjurney [hidden email] datasyndrome.com On May 18, 2012, at 2:01 PM, Ravi Shankar Nair <[hidden email]> wrote: > Why not Hbase with Hadoop? > It's a best bet. > Rgds, Ravi > > Sent from my Beethoven > > > On May 18, 2012, at 3:29 PM, Russell Jurney <[hidden email]> wrote: > >> I'm playing with using Hadoop and Pig to load MongoDB with data for Cube to >> consume. Cube <https://github.com/square/cube/wiki> is a realtime tool... >> but we'll be replaying events from the past. Does that count? It is nice >> to batch backfill metrics into 'real-time' systems in bulk. >> >> On Fri, May 18, 2012 at 12:11 PM, <[hidden email]> wrote: >> >>> Hi , >>> >>> Has anyone used Hadoop and splunk, or any other real-time processing tool >>> over Hadoop? >>> >>> Regards, >>> Shreya >>> >>> >>> >>> This e-mail and any files transmitted with it are for the sole use of the >>> intended recipient(s) and may contain confidential and privileged >>> information. If you are not the intended recipient(s), please reply to the >>> sender and destroy all copies of the original message. Any unauthorized >>> review, use, disclosure, dissemination, forwarding, printing or copying of >>> this email, and/or any action taken in reliance on the contents of this >>> e-mail is strictly prohibited and may be unlawful. >>> >> >> Russell Jurney twitter.com/rjurney [hidden email] datasyndrome.com |
I have used Hadoop and Splunk both. Can you please let me know what is your
requirement? Real time processing with hadoop depends upon What defines "Real time" in particular scenario. Based on requirement, Real time (near real time) can be achieved. ~Abhishek On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <[hidden email]>wrote: > Because that isn't Cube. > > Russell Jurney > twitter.com/rjurney > [hidden email] > datasyndrome.com > > On May 18, 2012, at 2:01 PM, Ravi Shankar Nair > <[hidden email]> wrote: > > > Why not Hbase with Hadoop? > > It's a best bet. > > Rgds, Ravi > > > > Sent from my Beethoven > > > > > > On May 18, 2012, at 3:29 PM, Russell Jurney <[hidden email]> > wrote: > > > >> I'm playing with using Hadoop and Pig to load MongoDB with data for > Cube to > >> consume. Cube <https://github.com/square/cube/wiki> is a realtime > tool... > >> but we'll be replaying events from the past. Does that count? It is > nice > >> to batch backfill metrics into 'real-time' systems in bulk. > >> > >> On Fri, May 18, 2012 at 12:11 PM, <[hidden email]> wrote: > >> > >>> Hi , > >>> > >>> Has anyone used Hadoop and splunk, or any other real-time processing > tool > >>> over Hadoop? > >>> > >>> Regards, > >>> Shreya > >>> > >>> > >>> > >>> This e-mail and any files transmitted with it are for the sole use of > the > >>> intended recipient(s) and may contain confidential and privileged > >>> information. If you are not the intended recipient(s), please reply to > the > >>> sender and destroy all copies of the original message. Any unauthorized > >>> review, use, disclosure, dissemination, forwarding, printing or > copying of > >>> this email, and/or any action taken in reliance on the contents of this > >>> e-mail is strictly prohibited and may be unlawful. > >>> > >> > >> Russell Jurney twitter.com/rjurney [hidden email] > datasyndrome.com > |
So a while back their was an article:
http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data I recently did my own take on full text searching your logs with solandra, though I have prototyped using solr inside datastax enterprise as well. http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/more_taco_bell_programming_with Splunk has a graphical front end with a good deal of sophistication, but I am quite happy just being able to solr search everything, and providing my own front ends built in solr. On Mon, May 21, 2012 at 5:13 PM, Abhishek Pratap Singh <[hidden email]> wrote: > I have used Hadoop and Splunk both. Can you please let me know what is your > requirement? > Real time processing with hadoop depends upon What defines "Real time" in > particular scenario. Based on requirement, Real time (near real time) can > be achieved. > > ~Abhishek > > On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <[hidden email]>wrote: > >> Because that isn't Cube. >> >> Russell Jurney >> twitter.com/rjurney >> [hidden email] >> datasyndrome.com >> >> On May 18, 2012, at 2:01 PM, Ravi Shankar Nair >> <[hidden email]> wrote: >> >> > Why not Hbase with Hadoop? >> > It's a best bet. >> > Rgds, Ravi >> > >> > Sent from my Beethoven >> > >> > >> > On May 18, 2012, at 3:29 PM, Russell Jurney <[hidden email]> >> wrote: >> > >> >> I'm playing with using Hadoop and Pig to load MongoDB with data for >> Cube to >> >> consume. Cube <https://github.com/square/cube/wiki> is a realtime >> tool... >> >> but we'll be replaying events from the past. Does that count? It is >> nice >> >> to batch backfill metrics into 'real-time' systems in bulk. >> >> >> >> On Fri, May 18, 2012 at 12:11 PM, <[hidden email]> wrote: >> >> >> >>> Hi , >> >>> >> >>> Has anyone used Hadoop and splunk, or any other real-time processing >> tool >> >>> over Hadoop? >> >>> >> >>> Regards, >> >>> Shreya >> >>> >> >>> >> >>> >> >>> This e-mail and any files transmitted with it are for the sole use of >> the >> >>> intended recipient(s) and may contain confidential and privileged >> >>> information. If you are not the intended recipient(s), please reply to >> the >> >>> sender and destroy all copies of the original message. Any unauthorized >> >>> review, use, disclosure, dissemination, forwarding, printing or >> copying of >> >>> this email, and/or any action taken in reliance on the contents of this >> >>> e-mail is strictly prohibited and may be unlawful. >> >>> >> >> >> >> Russell Jurney twitter.com/rjurney [hidden email] >> datasyndrome.com >> |
In reply to this post by Abhishek Pratap Singh
Hi Abhishek,
I am looking for a scenario where the customer representative needs to respond back to the customers on call. They need to search on huge data and then respond back in few seconds. Thanks and Regards, Shreya Pal Architect Technology Cognizant Technology Pvt Ltd Vnet - 205594 Mobile - +91-9766310680 -----Original Message----- From: Abhishek Pratap Singh [mailto:[hidden email]] Sent: Tuesday, May 22, 2012 2:44 AM To: [hidden email] Subject: Re: Splunk + Hadoop I have used Hadoop and Splunk both. Can you please let me know what is your requirement? Real time processing with hadoop depends upon What defines "Real time" in particular scenario. Based on requirement, Real time (near real time) can be achieved. ~Abhishek On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <[hidden email]>wrote: > Because that isn't Cube. > > Russell Jurney > twitter.com/rjurney > [hidden email] > datasyndrome.com > > On May 18, 2012, at 2:01 PM, Ravi Shankar Nair > <[hidden email]> wrote: > > > Why not Hbase with Hadoop? > > It's a best bet. > > Rgds, Ravi > > > > Sent from my Beethoven > > > > > > On May 18, 2012, at 3:29 PM, Russell Jurney > > <[hidden email]> > wrote: > > > >> I'm playing with using Hadoop and Pig to load MongoDB with data for > Cube to > >> consume. Cube <https://github.com/square/cube/wiki> is a realtime > tool... > >> but we'll be replaying events from the past. Does that count? It > >> is > nice > >> to batch backfill metrics into 'real-time' systems in bulk. > >> > >> On Fri, May 18, 2012 at 12:11 PM, <[hidden email]> wrote: > >> > >>> Hi , > >>> > >>> Has anyone used Hadoop and splunk, or any other real-time > >>> processing > tool > >>> over Hadoop? > >>> > >>> Regards, > >>> Shreya > >>> > >>> > >>> > >>> This e-mail and any files transmitted with it are for the sole use > >>> of > the > >>> intended recipient(s) and may contain confidential and privileged > >>> information. If you are not the intended recipient(s), please > >>> reply to > the > >>> sender and destroy all copies of the original message. Any > >>> unauthorized review, use, disclosure, dissemination, forwarding, > >>> printing or > copying of > >>> this email, and/or any action taken in reliance on the contents of > >>> this e-mail is strictly prohibited and may be unlawful. > >>> > >> > >> Russell Jurney twitter.com/rjurney [hidden email] > datasyndrome.com > |
Hi Shreya,
if you are looking at data locality, then you may or may not use hadoop out of the box. It will all depend on how you design the data layout on top of hdfs and how do you implement search based on the customer queries. a good idea might be have hop-in queryable database like mysql inbetween where you can store the results of your data being processed on hadoop and then use solr search for fast access and search. Thanks, Nitin On Mon, May 28, 2012 at 12:41 PM, <[hidden email]> wrote: > Hi Abhishek, > > I am looking for a scenario where the customer representative needs to > respond back to the customers on call. > They need to search on huge data and then respond back in few seconds. > > Thanks and Regards, > Shreya Pal > Architect Technology > Cognizant Technology Pvt Ltd > Vnet - 205594 > Mobile - +91-9766310680 > > > -----Original Message----- > From: Abhishek Pratap Singh [mailto:[hidden email]] > Sent: Tuesday, May 22, 2012 2:44 AM > To: [hidden email] > Subject: Re: Splunk + Hadoop > > I have used Hadoop and Splunk both. Can you please let me know what is > your requirement? > Real time processing with hadoop depends upon What defines "Real time" in > particular scenario. Based on requirement, Real time (near real time) can > be achieved. > > ~Abhishek > > On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <[hidden email] > >wrote: > > > Because that isn't Cube. > > > > Russell Jurney > > twitter.com/rjurney > > [hidden email] > > datasyndrome.com > > > > On May 18, 2012, at 2:01 PM, Ravi Shankar Nair > > <[hidden email]> wrote: > > > > > Why not Hbase with Hadoop? > > > It's a best bet. > > > Rgds, Ravi > > > > > > Sent from my Beethoven > > > > > > > > > On May 18, 2012, at 3:29 PM, Russell Jurney > > > <[hidden email]> > > wrote: > > > > > >> I'm playing with using Hadoop and Pig to load MongoDB with data for > > Cube to > > >> consume. Cube <https://github.com/square/cube/wiki> is a realtime > > tool... > > >> but we'll be replaying events from the past. Does that count? It > > >> is > > nice > > >> to batch backfill metrics into 'real-time' systems in bulk. > > >> > > >> On Fri, May 18, 2012 at 12:11 PM, <[hidden email]> wrote: > > >> > > >>> Hi , > > >>> > > >>> Has anyone used Hadoop and splunk, or any other real-time > > >>> processing > > tool > > >>> over Hadoop? > > >>> > > >>> Regards, > > >>> Shreya > > >>> > > >>> > > >>> > > >>> This e-mail and any files transmitted with it are for the sole use > > >>> of > > the > > >>> intended recipient(s) and may contain confidential and privileged > > >>> information. If you are not the intended recipient(s), please > > >>> reply to > > the > > >>> sender and destroy all copies of the original message. Any > > >>> unauthorized review, use, disclosure, dissemination, forwarding, > > >>> printing or > > copying of > > >>> this email, and/or any action taken in reliance on the contents of > > >>> this e-mail is strictly prohibited and may be unlawful. > > >>> > > >> > > >> Russell Jurney twitter.com/rjurney [hidden email] > > datasyndrome.com > > > This e-mail and any files transmitted with it are for the sole use of the > intended recipient(s) and may contain confidential and privileged > information. If you are not the intended recipient(s), please reply to the > sender and destroy all copies of the original message. Any unauthorized > review, use, disclosure, dissemination, forwarding, printing or copying of > this email, and/or any action taken in reliance on the contents of this > e-mail is strictly prohibited and may be unlawful. > -- Nitin Pawar |
In reply to this post by Shreya.Pal
Shreya - there are two major considerations here. First, can the system
process the required information, make it easily accessible, and do that with the required accuracy for a user based search paradigm . Second, can the system do that fast enough to meet the time window of the use case. It is unclear what type/source of information needs to be processed and then made available for retrieval, how long a search can take and still be considered OK, or the total latency (not just retrieval during the search phase) from information acquisition to being searchable. If you can share those details the group can help provide more specific/better coaching. ------------------------------------------------ Tom Deutsch Program Director Information Management Big Data Technologies IBM 3565 Harbor Blvd Costa Mesa, CA 92626-1420 [hidden email] Twitter: @thomasdeutsch Data Management Blog: ibmdatamag.com/author/tdeutsch/ LinkedIn: http://www.linkedin.com/profile/view?id=833160 Quora: http://www.quora.com/Tom-Deutsch Smarter Computing Blog: http://www.smartercomputingblog.com/contributorsprofile/?user_id=223 Big Data for Business Executives Group: http://www.linkedin.com/groups?gid=4455695 From: <[hidden email]> To: <[hidden email]>, Date: 05/28/2012 12:12 AM Subject: RE: Splunk + Hadoop Hi Abhishek, I am looking for a scenario where the customer representative needs to respond back to the customers on call. They need to search on huge data and then respond back in few seconds. Thanks and Regards, Shreya Pal Architect Technology Cognizant Technology Pvt Ltd Vnet - 205594 Mobile - +91-9766310680 -----Original Message----- From: Abhishek Pratap Singh [mailto:[hidden email]] Sent: Tuesday, May 22, 2012 2:44 AM To: [hidden email] Subject: Re: Splunk + Hadoop I have used Hadoop and Splunk both. Can you please let me know what is your requirement? Real time processing with hadoop depends upon What defines "Real time" in particular scenario. Based on requirement, Real time (near real time) can be achieved. ~Abhishek On Fri, May 18, 2012 at 3:58 PM, Russell Jurney <[hidden email]>wrote: > Because that isn't Cube. > > Russell Jurney > twitter.com/rjurney > [hidden email] > datasyndrome.com > > On May 18, 2012, at 2:01 PM, Ravi Shankar Nair > <[hidden email]> wrote: > > > Why not Hbase with Hadoop? > > It's a best bet. > > Rgds, Ravi > > > > Sent from my Beethoven > > > > > > On May 18, 2012, at 3:29 PM, Russell Jurney > > <[hidden email]> > wrote: > > > >> I'm playing with using Hadoop and Pig to load MongoDB with data for > Cube to > >> consume. Cube <https://github.com/square/cube/wiki> is a realtime > tool... > >> but we'll be replaying events from the past. Does that count? It > >> is > nice > >> to batch backfill metrics into 'real-time' systems in bulk. > >> > >> On Fri, May 18, 2012 at 12:11 PM, <[hidden email]> wrote: > >> > >>> Hi , > >>> > >>> Has anyone used Hadoop and splunk, or any other real-time > >>> processing > tool > >>> over Hadoop? > >>> > >>> Regards, > >>> Shreya > >>> > >>> > >>> > >>> This e-mail and any files transmitted with it are for the sole use > >>> of > the > >>> intended recipient(s) and may contain confidential and privileged > >>> information. If you are not the intended recipient(s), please > >>> reply to > the > >>> sender and destroy all copies of the original message. Any > >>> unauthorized review, use, disclosure, dissemination, forwarding, > >>> printing or > copying of > >>> this email, and/or any action taken in reliance on the contents of > >>> this e-mail is strictly prohibited and may be unlawful. > >>> > >> > >> Russell Jurney twitter.com/rjurney [hidden email] > datasyndrome.com > intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient(s), please reply to the sender and destroy all copies of the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email, and/or any action taken in reliance on the contents of this e-mail is strictly prohibited and may be unlawful. |
Free forum by Nabble | Edit this page |