Re[2]: multi word synonym (was Hungarian notation analyzer and phrase queries)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re[2]: multi word synonym (was Hungarian notation analyzer and phrase queries)

Sven Duzont
Hello,

What about the solution to index every multi-word synonym as a single
token ?
Example :
Phrase to index : "i love jsp and tomcat"
Synonyms        : "jsp" = "java server pages" = "javaserver pages"
Tokens          : i love jsp               and tomcat
                         java server pages
                         javaserver pages
Position        : 0 1    2                 3   4

This solution will have the advantage to solve the phrase query problems.

One will have also to rewrite queries before parsing it with the
QueryParser. for instance the query (tomcat jsp) will be rewrited as
(tomcat (jsp OR "java server pages" OR "javaserver pages"))

Any thoughts ?
Thanks in advance

---
 Sven





mercredi 13 avril 2005, 19:36:44, vous avez ?crit:


CH> : Another approach would be to index this as:
CH> :
CH> : token:       use   power      query for advanced searches
CH> :                     powerquery
CH> : position:    0     1          2     3   4        5
CH> :
CH> : Then use phrase queries with slop=1, to permit a one-token gap when
CH> : someone searches for "use powerquery for advanced searches".

CH> right, but in your example "1" is a magic number that works because we are
CH> only dealing with "multi-word synonyms" of 2 words.  in general, this
CH> approach requires that you pick some some N such that you are garunteed no
CH> synonym contains more then N-1 words, and set the token positions to...

CH>  token:       use   powerquery         for  advanced  searches
CH>                     power      query
CH>  position:    0     N          N+1     2N   3N        4N





CH> -Hoss


CH> ---------------------------------------------------------------------
CH> To unsubscribe, e-mail: [hidden email]
CH> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re[2]: multi word synonym (was Hungarian notation analyzer and phrase queries)

Paul Smith-4
Indexing every multi-word synonym as a single token would introduce
spaces into the tokens. In that case searching for (java) would not
match "i love jsp and tomcat". I think that searching for (java*) would
match.

Rewriting the query is also problematic. If you search for (java
server), you don't have a rule to rewrite it to help you find jsp.

Paul

>>> [hidden email] 04/27/05 05:51AM >>>
Hello,

What about the solution to index every multi-word synonym as a single
token ?
Example :
Phrase to index : "i love jsp and tomcat"
Synonyms        : "jsp" = "java server pages" = "javaserver pages"
Tokens          : i love jsp               and tomcat
                         java server pages
                         javaserver pages
Position        : 0 1    2                 3   4

This solution will have the advantage to solve the phrase query
problems.

One will have also to rewrite queries before parsing it with the
QueryParser. for instance the query (tomcat jsp) will be rewrited as
(tomcat (jsp OR "java server pages" OR "javaserver pages"))

Any thoughts ?
Thanks in advance

---
 Sven





mercredi 13 avril 2005, 19:36:44, vous avez ?crit:


CH> : Another approach would be to index this as:
CH> :
CH> : token:       use   power      query for advanced searches
CH> :                     powerquery
CH> : position:    0     1          2     3   4        5
CH> :
CH> : Then use phrase queries with slop=1, to permit a one-token gap
when
CH> : someone searches for "use powerquery for advanced searches".

CH> right, but in your example "1" is a magic number that works because
we are
CH> only dealing with "multi-word synonyms" of 2 words.  in general,
this
CH> approach requires that you pick some some N such that you are
garunteed no
CH> synonym contains more then N-1 words, and set the token positions
to...

CH>  token:       use   powerquery         for  advanced  searches
CH>                     power      query
CH>  position:    0     N          N+1     2N   3N        4N





CH> -Hoss


CH>
---------------------------------------------------------------------
CH> To unsubscribe, e-mail: [hidden email]
CH> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


http://www.tenfold.com

**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the TenFold Postmaster ([hidden email]).
**********************************************************************


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Re[2]: multi word synonym (was Hungarian notation analyzer and phrase queries)

Paul Libbrecht
I knew there was a catch...

I do think, however, that the point is a delicate one which would
consideration: multi-word synonyms are quite common!

paul


Le 29 avr. 05, à 18:47, Paul Smith a écrit :

> Indexing every multi-word synonym as a single token would introduce
> spaces into the tokens. In that case searching for (java) would not
> match "i love jsp and tomcat". I think that searching for (java*) would
> match.
>
> Rewriting the query is also problematic. If you search for (java
> server), you don't have a rule to rewrite it to help you find jsp.
>
> Paul
>
>>>> [hidden email] 04/27/05 05:51AM >>>
> Hello,
>
> What about the solution to index every multi-word synonym as a single
> token ?
> Example :
> Phrase to index : "i love jsp and tomcat"
> Synonyms        : "jsp" = "java server pages" = "javaserver pages"
> Tokens          : i love jsp               and tomcat
>                          java server pages
>                          javaserver pages
> Position        : 0 1    2                 3   4
>
> This solution will have the advantage to solve the phrase query
> problems.
>
> One will have also to rewrite queries before parsing it with the
> QueryParser. for instance the query (tomcat jsp) will be rewrited as
> (tomcat (jsp OR "java server pages" OR "javaserver pages"))
>
> Any thoughts ?
> Thanks in advance
>
> ---
>  Sven
>
>
>
>
>
> mercredi 13 avril 2005, 19:36:44, vous avez écrit:
>
>
> CH> : Another approach would be to index this as:
> CH> :
> CH> : token:       use   power      query for advanced searches
> CH> :                     powerquery
> CH> : position:    0     1          2     3   4        5
> CH> :
> CH> : Then use phrase queries with slop=1, to permit a one-token gap
> when
> CH> : someone searches for "use powerquery for advanced searches".
>
> CH> right, but in your example "1" is a magic number that works because
> we are
> CH> only dealing with "multi-word synonyms" of 2 words.  in general,
> this
> CH> approach requires that you pick some some N such that you are
> garunteed no
> CH> synonym contains more then N-1 words, and set the token positions
> to...
>
> CH>  token:       use   powerquery         for  advanced  searches
> CH>                     power      query
> CH>  position:    0     N          N+1     2N   3N        4N
>
>
>
>
>
> CH> -Hoss
>
>
> CH>
> ---------------------------------------------------------------------
> CH> To unsubscribe, e-mail: [hidden email]
> CH> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
> http://www.tenfold.com
>
> **********************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the TenFold Postmaster ([hidden email]).
> **********************************************************************
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re[4]: multi word synonym (was Hungarian notation analyzer and phrase queries)

Sven Duzont-2
Hi,

 Thanks for your reply Paul.
 Yes, this was a delicate point.
 I gave up indexing  multi-words synonyms as single token for the
 reason you pointed.

 To handle phraseQueries, i change the positions of the Terms that
 follows the synonyms.
 For instance for the PhraseQuery "jsp and vb developer", with the
 synonyms "vb:virtual basic" and "jsp:java server pages" the positions
 of the terms will be :
-Terms    : jsp vb developer
-Position : 0   3  5

 Doing like this, the queries for "jsp server" will not match.

 I have pieces of code to give if some people are interested

 Thanks again.

Sven

Le vendredi 29 avril 2005 ? 21:58:54, vous ?criviez :

PL> I knew there was a catch...

PL> I do think, however, that the point is a delicate one which would
PL> consideration: multi-word synonyms are quite common!

PL> paul



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...