TermVector query using Solr Tutorial

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

TermVector query using Solr Tutorial

Ryan Chan-2
Hello all,

I am following this tutorial:
http://lucene.apache.org/solr/tutorial.html, I am playing with the
TermVector, here is my step:


1. Launch the example server, java -jar start.jar

2. Index the monitor.xml, java -jar post.jar monitor.xml, which
contains the following

<add><doc>
  <field name="id">3007WFP</field>
  <field name="name">Dell Widescreen UltraSharp 3007WFP</field>
  <field name="manu">Dell, Inc.</field>
  <field name="cat">electronics</field>
  <field name="cat">monitor</field>
  <field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm
dot pitch, 700:1 contrast</field>
  <field name="includes">USB cable</field>
  <field name="weight">401.6</field>
  <field name="price">2199</field>
  <field name="popularity">6</field>
  <field name="inStock">true</field>
</doc></add>


3. Execute the query to search for "25", as you can see, there are two
`25` in the field features, i.e.
http://localhost/solr/select/?q=25&version=2.2&start=0&rows=10&indent=on&qt=tvrh&tv.all=true

4. The term vector in the result does not make sense to me


<lst name="termVectors">
-
<lst name="doc-2">
<str name="uniqueKey">3007WFP</str>
-
<lst name="includes">
-
<lst name="cabl">
<int name="tf">1</int>
-
<lst name="offsets">
<int name="start">4</int>
<int name="end">9</int>
</lst>
-
<lst name="positions">
<int name="position">1</int>
</lst>
<int name="df">1</int>
<double name="tf-idf">1.0</double>
</lst>
-
<lst name="usb">
<int name="tf">1</int>
-
<lst name="offsets">
<int name="start">0</int>
<int name="end">3</int>
</lst>
-
<lst name="positions">
<int name="position">0</int>
</lst>
<int name="df">1</int>
<double name="tf-idf">1.0</double>
</lst>
</lst>
</lst>
<str name="uniqueKeyFieldName">id</str>
</lst>

What I want to know is the relative position the keywords within a field.

Anyone can explain the above result to me?

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: TermVector query using Solr Tutorial

Grant Ingersoll-2
Inline...

On Feb 5, 2011, at 4:28 AM, Ryan Chan wrote:

> Hello all,
>
> I am following this tutorial:
> http://lucene.apache.org/solr/tutorial.html, I am playing with the
> TermVector, here is my step:
>
>
> 1. Launch the example server, java -jar start.jar
>
> 2. Index the monitor.xml, java -jar post.jar monitor.xml, which
> contains the following
>
> <add><doc>
>  <field name="id">3007WFP</field>
>  <field name="name">Dell Widescreen UltraSharp 3007WFP</field>
>  <field name="manu">Dell, Inc.</field>
>  <field name="cat">electronics</field>
>  <field name="cat">monitor</field>
>  <field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm
> dot pitch, 700:1 contrast</field>
>  <field name="includes">USB cable</field>
>  <field name="weight">401.6</field>
>  <field name="price">2199</field>
>  <field name="popularity">6</field>
>  <field name="inStock">true</field>
> </doc></add>
>
>
> 3. Execute the query to search for "25", as you can see, there are two
> `25` in the field features, i.e.
> http://localhost/solr/select/?q=25&version=2.2&start=0&rows=10&indent=on&qt=tvrh&tv.all=true
>
> 4. The term vector in the result does not make sense to me
>
>
> <lst name="termVectors">
> -
> <lst name="doc-2">
> <str name="uniqueKey">3007WFP</str>
> -
> <lst name="includes">
> -
> <lst name="cabl">
> <int name="tf">1</int>
> -
> <lst name="offsets">
> <int name="start">4</int>
> <int name="end">9</int>
> </lst>
> -
> <lst name="positions">
> <int name="position">1</int>
> </lst>
> <int name="df">1</int>
> <double name="tf-idf">1.0</double>
> </lst>
> -
> <lst name="usb">
> <int name="tf">1</int>
> -
> <lst name="offsets">
> <int name="start">0</int>
> <int name="end">3</int>
> </lst>
> -
> <lst name="positions">
> <int name="position">0</int>
> </lst>
> <int name="df">1</int>
> <double name="tf-idf">1.0</double>
> </lst>
> </lst>
> </lst>
> <str name="uniqueKeyFieldName">id</str>
> </lst>
>
> What I want to know is the relative position the keywords within a field.
>
> Anyone can explain the above result to me?


It's a little hard to read due to the indentation, but AFAICT you have two terms, usb and "cabl".  USB appears at position 0 and cabl at position 1.  Those are the relative positions to each other.  Perhaps you can explain a bit more what you are trying to do?

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem docs using Solr/Lucene:
http://www.lucidimagination.com/search

Reply | Threaded
Open this post in threaded view
|

Re: TermVector query using Solr Tutorial

Ryan Chan-2
Hello,

On Tue, Feb 8, 2011 at 11:12 PM, Grant Ingersoll <[hidden email]> wrote:
>
> It's a little hard to read due to the indentation, but AFAICT you have two terms, usb and "cabl".  USB appears at position 0 and cabl at position 1.  Those are the relative positions to each other.  Perhaps you can explain a bit more what you are trying to do?

I am searching the keyword 25, in the field

<field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm
dot pitch, 700:1 contrast</field>

I want to know the character position of matched keyword in the
corresponding field.

usb or cabl is not what I want.
Reply | Threaded
Open this post in threaded view
|

Re: TermVector query using Solr Tutorial

Chris Hostetter-3

: I am searching the keyword 25, in the field
:
: <field name="features">30" TFT active matrix LCD, 2560 x 1600, .25mm
: dot pitch, 700:1 contrast</field>
:
: I want to know the character position of matched keyword in the
: corresponding field.
:
: usb or cabl is not what I want.

your search is getting a match on the features field, but the termvectors
being returned are from the "includes" field, which you can see based on
the output you mentioned in your previous message...

> <lst name="termVectors">
> -
> <lst name="doc-2">
> <str name="uniqueKey">3007WFP</str>
> -
> <lst name="includes">
> -
> <lst name="cabl">
> <int name="tf">1</int>
...

...by the looks of things, the "includes" field is the only field with
termVectors enabled in your schema.xml (which is consistent with the
trunk, 3x, and solr 1.4 example schemas.

if you want termVectors for hte "features" field, you need to specify
termVectors="true" on the "features" field.




-Hoss