Papers on text clustering

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Papers on text clustering

Grant Ingersoll-2
Hi Mahouters,

I'm looking for papers that you recommend on text clustering (I can,  
of course, go search for them, but I'd like recommendations).  New,  
old, doesn't matter.  Either send them here or add them to the wiki at http://cwiki.apache.org/confluence/display/MAHOUT/Reference+Reading

Thanks,
Grant
Reply | Threaded
Open this post in threaded view
|

Re: Papers on text clustering

IsabelDrost
On Wednesday 11 February 2009, Grant Ingersoll wrote:
> I'm looking for papers that you recommend on text clustering (I can,
> of course, go search for them, but I'd like recommendations).  New,
> old, doesn't matter.  Either send them here or add them to the wiki at
> http://cwiki.apache.org/confluence/display/MAHOUT/Reference+Reading

Hmm, I know a few books that also cover the topic of clustering texts - maybe
one of these would be a good starting point.

I like the book "Introduction to Information Retrieval" by Manning, Raghavan
and Schütze. It also contains some chapters on the topic.

"Data Mining" from Witten and Frank has a chapter on the topic.

"Foundations of Statistical Natural Language Processing" has a chapter as
well.

Are you looking for something in particular?

Isabel


--
Check it out, send me comments, and dance joyously in the streets, -- Linus
Torvalds announcing 2.0.27
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

signature.asc (204 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Papers on text clustering

Grant Ingersoll-2
I've read a number of papers on it, was just looking for items that  
people recommend as a way to, potentially, round out my knowledge of  
the different approaches.

I've got the Data Mining book and the Foundations book, so will  
refresh my memory on those as well


On Feb 11, 2009, at 12:39 PM, Isabel Drost wrote:

> On Wednesday 11 February 2009, Grant Ingersoll wrote:
>> I'm looking for papers that you recommend on text clustering (I can,
>> of course, go search for them, but I'd like recommendations).  New,
>> old, doesn't matter.  Either send them here or add them to the wiki  
>> at
>> http://cwiki.apache.org/confluence/display/MAHOUT/Reference+Reading
>
> Hmm, I know a few books that also cover the topic of clustering  
> texts - maybe
> one of these would be a good starting point.
>
> I like the book "Introduction to Information Retrieval" by Manning,  
> Raghavan
> and Schütze. It also contains some chapters on the topic.
>
> "Data Mining" from Witten and Frank has a chapter on the topic.
>
> "Foundations of Statistical Natural Language Processing" has a  
> chapter as
> well.
>
> Are you looking for something in particular?
>
> Isabel
>
>
> --
> Check it out, send me comments, and dance joyously in the  
> streets, -- Linus
> Torvalds announcing 2.0.27
>  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>  /,`.-'`'    -.  ;-;;,_
> |,4-  ) )-,_..;\ (  `'-'
> '---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Papers on text clustering

Jason Rennie-2
This might be a good starting point on modern methods:

http://www.cs.princeton.edu/~blei/papers/BleiLafferty2009.pdf

Blei is one of the premier researchers in this area.  Looks like he has lots
of useful info on his home page:

http://www.cs.princeton.edu/~blei/

Cheers,

Jason

On Wed, Feb 11, 2009 at 4:13 PM, Grant Ingersoll <[hidden email]>wrote:

> I've read a number of papers on it, was just looking for items that people
> recommend as a way to, potentially, round out my knowledge of the different
> approaches.
>
> I've got the Data Mining book and the Foundations book, so will refresh my
> memory on those as well
>
>
>
> On Feb 11, 2009, at 12:39 PM, Isabel Drost wrote:
>
>  On Wednesday 11 February 2009, Grant Ingersoll wrote:
>>
>>> I'm looking for papers that you recommend on text clustering (I can,
>>> of course, go search for them, but I'd like recommendations).  New,
>>> old, doesn't matter.  Either send them here or add them to the wiki at
>>> http://cwiki.apache.org/confluence/display/MAHOUT/Reference+Reading
>>>
>>
>> Hmm, I know a few books that also cover the topic of clustering texts -
>> maybe
>> one of these would be a good starting point.
>>
>> I like the book "Introduction to Information Retrieval" by Manning,
>> Raghavan
>> and Schütze. It also contains some chapters on the topic.
>>
>> "Data Mining" from Witten and Frank has a chapter on the topic.
>>
>> "Foundations of Statistical Natural Language Processing" has a chapter as
>> well.
>>
>> Are you looking for something in particular?
>>
>> Isabel
>>
>>
>> --
>> Check it out, send me comments, and dance joyously in the streets,
>>        -- Linus
>> Torvalds announcing 2.0.27
>>  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>>  /,`.-'`'    -.  ;-;;,_
>> |,4-  ) )-,_..;\ (  `'-'
>> '---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>
>>
>
>


--
Jason Rennie
Research Scientist, ITA Software
http://www.itasoftware.com/
Reply | Threaded
Open this post in threaded view
|

Re: Papers on text clustering

Neal Richter-3
In reply to this post by Grant Ingersoll-2
I'll try and help round up some recent survey papers in this area over
the next few days.

Here are two (one paper and one slide deck):
http://eprints.cs.vt.edu/archive/00001000/01/docclust.pdf
http://www.alphaminer.org/document/downloads/TextMining/(ppt)%20Survey%20of%20Text%20Clustering.pdf

My biased opinion at this point is that there aren't many new seminal
works in text clustering... the hardest problem isn't the clustering
algorithm itself.. it's deciding what terms/phrases to cluster with
and what not to (feature selection).

On Wed, Feb 11, 2009 at 2:13 PM, Grant Ingersoll <[hidden email]> wrote:

> I've read a number of papers on it, was just looking for items that people
> recommend as a way to, potentially, round out my knowledge of the different
> approaches.
>
> I've got the Data Mining book and the Foundations book, so will refresh my
> memory on those as well
>
>
> On Feb 11, 2009, at 12:39 PM, Isabel Drost wrote:
>
>> On Wednesday 11 February 2009, Grant Ingersoll wrote:
>>>
>>> I'm looking for papers that you recommend on text clustering (I can,
>>> of course, go search for them, but I'd like recommendations).  New,
>>> old, doesn't matter.  Either send them here or add them to the wiki at
>>> http://cwiki.apache.org/confluence/display/MAHOUT/Reference+Reading
>>
>> Hmm, I know a few books that also cover the topic of clustering texts -
>> maybe
>> one of these would be a good starting point.
>>
>> I like the book "Introduction to Information Retrieval" by Manning,
>> Raghavan
>> and Schütze. It also contains some chapters on the topic.
>>
>> "Data Mining" from Witten and Frank has a chapter on the topic.
>>
>> "Foundations of Statistical Natural Language Processing" has a chapter as
>> well.
>>
>> Are you looking for something in particular?
>>
>> Isabel
>>
>>
>> --
>> Check it out, send me comments, and dance joyously in the streets,
>>        -- Linus
>> Torvalds announcing 2.0.27
>>  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>>  /,`.-'`'    -.  ;-;;,_
>> |,4-  ) )-,_..;\ (  `'-'
>> '---''(_/--'  `-'\_) (fL)  IM:  <xmpp://[hidden email]>
>
>