Dealing with acronyms

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Dealing with acronyms

Hannes Carl Meyer
Hi All,

I would like enable users to do an acronym search on my index.
My idea is the following:

1.) Extract acronyms (ABS, ESP, VCG etc.) from the given document (which
is going to be indexed)

2.) Store the extracted acronyms in a field, for example called "case"

3.) On search, asking the user to use case:"ABS" to search for acronyms

Any experience with this kind of pattern? Other ideas or best practices?

Thank you in advance and best regards

Hannes

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dealing with acronyms

Stefan Will-2
This makes perfect sense to me. Of course the hard part will be how to
extract the acronyms.

-- Stefan

Hannes Carl Meyer wrote:

> Hi All,
>
> I would like enable users to do an acronym search on my index.
> My idea is the following:
>
> 1.) Extract acronyms (ABS, ESP, VCG etc.) from the given document
> (which is going to be indexed)
>
> 2.) Store the extracted acronyms in a field, for example called "case"
>
> 3.) On search, asking the user to use case:"ABS" to search for acronyms
>
> Any experience with this kind of pattern? Other ideas or best practices?
>
> Thank you in advance and best regards
>
> Hannes
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dealing with acronyms

Find Me
In reply to this post by Hannes Carl Meyer
On 4/26/06, Hannes Carl Meyer <[hidden email]> wrote:
>
> Hi All,
>
> I would like enable users to do an acronym search on my index.
> My idea is the following:
>
> 1.) Extract acronyms (ABS, ESP, VCG etc.) from the given document (which
> is going to be indexed)


In case you havent already looked at, you might find this useful.
http://www.cs.waikato.ac.nz/~nzdl/publications/1999/Yeates-Auto-Extract.pdf


2.) Store the extracted acronyms in a field, for example called "case"
>
> 3.) On search, asking the user to use case:"ABS" to search for acronyms


I would rather store them in the same field with others, so that you can do
phrase queries. Store the acronyms just like you would store synonyms. More
information on how to store synonyms is in "Lucene in Action" book. This
would facilitate queries like "USA President". If you store "USA" in a
separate field, you wouldn't be able to match this query.

Any experience with this kind of pattern? Other ideas or best practices?

I would also look at HMMs/CRFs to extract acronyms. You need to come up with
a list of features to identify a potential acronym. For ex:
- All Caps
- The acronym appears repeatedly in the rest of the text
- Found in the acronym dictionary...etc

Hope this helps,

--Rajesh Munavalli
Blog: http://munavalli.blogspot.com
Reply | Threaded
Open this post in threaded view
|

Re: Dealing with acronyms

Hannes Carl Meyer
Rajesh Munavalli schrieb:

> On 4/26/06, Hannes Carl Meyer <[hidden email]> wrote:
>  
>> Hi All,
>>
>> I would like enable users to do an acronym search on my index.
>> My idea is the following:
>>
>> 1.) Extract acronyms (ABS, ESP, VCG etc.) from the given document (which
>> is going to be indexed)
>>    
>
>
> In case you havent already looked at, you might find this useful.
> http://www.cs.waikato.ac.nz/~nzdl/publications/1999/Yeates-Auto-Extract.pdf
>
>
> 2.) Store the extracted acronyms in a field, for example called "case"
>  
>> 3.) On search, asking the user to use case:"ABS" to search for acronyms
>>    
>
>
> I would rather store them in the same field with others, so that you can do
> phrase queries. Store the acronyms just like you would store synonyms. More
> information on how to store synonyms is in "Lucene in Action" book. This
> would facilitate queries like "USA President". If you store "USA" in a
> separate field, you wouldn't be able to match this query.
>
> Any experience with this kind of pattern? Other ideas or best practices?
>
> I would also look at HMMs/CRFs to extract acronyms. You need to come up with
> a list of features to identify a potential acronym. For ex:
> - All Caps
> - The acronym appears repeatedly in the rest of the text
> - Found in the acronym dictionary...etc
>
> Hope this helps,
>
> --Rajesh Munavalli
> Blog: http://munavalli.blogspot.com
>
>  
Hi,

thank you, thats a good advice - I don't have the Lucene in Action Book,
but I think its worth taking a look at it.

So I guess its done by writing or extending an anylzer?

H.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Dealing with acronyms

Find Me
>
>
> So I guess its done by writing or extending an anylzer?
>
Yes...thats correct.

--Rajesh Munavalli
Blog: http://munavalli.blogspot.com