Indexing Problem: Where's my data?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Indexing Problem: Where's my data?

Michael Griffiths
Hi,

(The first version of this was rejected for spam).

I'm setting up a test instance of Solr, and keep running into the problem of having Solr not work the way I think it should work. Specifically, the data I want to go into the index isn't there after indexing. I'm extracting the data from MSSQL via DataImportHandler, JDBC 4.0.

My data is set up that for every product ID there is one category (hierarchical, but I'm not dealing with that ATM), a family, and a set of attributes (which includes name, etc). After indexing, I get Category, Family, and Product ID - but nothing from my attribute values (STRING_NAME, below) - which is the most useful data.

Is there something wrong with my schema?

I thought it might be that the schema.xml file wasn't respecting the names I assigned via the DataImportHandler; when I changed to the column names in the schema.xml, I picked up Family and Category (previously, it was only product ID).

I'm really banging my head against the wall at this point, so I'd appreciate any help. My step will probably be to do a considerably more complicated denormalization (in terms of the SQL), which would make the Solr end simpler (but that has problems of its own).

Config information below.

Any help appreciated.

Thanks,
Michael

Data Config:


<dataConfig>
    <dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver" url="jdbc:sqlserver://localhost\DEVELOPMENT/Databases/data:1433"  />
    <document name="products">
        <entity onError="continue" name="product" query="select Product_ID,Category_ID from TB_Product">
            <field column="PRODUCT_ID" name="pid" />
            <field column="CATEGORY_ID" name="cid" />

            <entity name="facets" query="select * from TB_PROD_SPECS where PRODUCT_ID=${product.Product_ID}">
                <field column="STRING_VALUE" />
                <field column="NUMERIC_VALUE" />
                <entity name="attributes" query="select ATTRIBUTE_NAME,ATTRIBUTE_TYPE from TB_ATTRIBUTE where ATTRIBUTE_ID=${facets.ATTRIBUTE_ID}">
                    <field column="Attribute_Name" name="Attribute Name" />
                </entity>
            </entity>

            <entity name="category" query="select CATEGORY_NAME,PARENT_CATEGORY from TB_CATEGORY where CATEGORY_ID='${product.Category_ID}'">
                <field column="Category_Name" name="Category" />
                <field column="Parent_Category" name="Parent Category" />
            </entity>

            <entity name="family_id" query="select FAMILY_ID from TB_PROD_FAMILY where Product_ID = ${product.Product_ID}">
                <entity name="family" query="select FAMILY_Name,PARENT_FAMILY_ID,ROOT_FAMILY,CATEGORY_ID from TB_Family where Family_ID = ${family_id.FAMILY_ID}">
                    <field column="FAMILY_NAME" name="Family" />
                    <field column="ROOT_FAMILY" name="Root Family" />
                    <field column="PARENT_FAMILY" name="Parent Family" />
                    <field column="Category_id" name="Category ID" />
                </entity>
            </entity>
        </entity>
    </document>
</dataConfig>


Schema:

    <fields>
        <field name="Product_ID" type="int" indexed="true" stored="true" required="true" />
        <field name="Family_NAME" type="textTight" indexed="true" stored="false" multivalued="true"/>
        <field name="Category_Name" type="textTight" indexed="true" stored="true" multiValued="true" omitNorms="true" />

        <field name="STRING_VALUE" type="textTight" indexed="true" stored="false" multivalued="true"/>
        <field name="ATTRIBUTE_NAME" type="textTight" indexed="true" stored="false" multivalued="true"/>
        <field name="text" type="text" indexed="true" stored="false" multiValued="true"/>

        <dynamicField name="*_i"  type="string"    indexed="true"  stored="true" multivalued="true"/>

    </fields>

    <uniqueKey>Product_ID</uniqueKey>
    <defaultSearchField>text</defaultSearchField>

    <solrQueryParser defaultOperator="OR"/>

    <copyField source="*" dest="text"/>


Reply | Threaded
Open this post in threaded view
|

Re: Indexing Problem: Where's my data?

kenf_nc
for STRING_VALUE, I assume there is a property in the 'select *' results called string_value? if so I'm not sure why it wouldn't work. If not, then that's why, it doesn't have anything to put there.

For ATTRIBUTE_NAME, is it possibly a case issue? you called it 'Attribute_Name' in your query, but ATTRIBUTE_NAME in your schema...just something to check I guess.

Also, not sure why you are using name= in your fields, for example,                    
<field column="PARENT_FAMILY" name="Parent Family" /> 
I thought 'column' was the source field name and 'name' was supposed to be the schema field name and if not there it would assume 'column' name. You don't have a schema field called "Parent Family" so it looks like it's defaulting to column name too which is lucky for you I suppose. But you may want to either remove 'name=' or make it match the schema. (and I may be completely wrong on this, it's been a while since I got DIH going).

Reply | Threaded
Open this post in threaded view
|

Re: Indexing Problem: Where's my data?

Lance Norskog-2
Solr respects case for field names.  Database fields are supplied in
lower-case, so it should be 'attribute_name' and 'string_value'. Also
'product_id', etc.

It is easier if you carefully emulate every detail in the examples,
for example lower-case names.

On Tue, Jul 27, 2010 at 2:59 PM, kenf_nc <[hidden email]> wrote:

>
> for STRING_VALUE, I assume there is a property in the 'select *' results
> called string_value? if so I'm not sure why it wouldn't work. If not, then
> that's why, it doesn't have anything to put there.
>
> For ATTRIBUTE_NAME, is it possibly a case issue? you called it
> 'Attribute_Name' in your query, but ATTRIBUTE_NAME in your schema...just
> something to check I guess.
>
> Also, not sure why you are using name= in your fields, for example,
> <field column="PARENT_FAMILY" name="Parent Family" />
> I thought 'column' was the source field name and 'name' was supposed to be
> the schema field name and if not there it would assume 'column' name. You
> don't have a schema field called "Parent Family" so it looks like it's
> defaulting to column name too which is lucky for you I suppose. But you may
> want to either remove 'name=' or make it match the schema. (and I may be
> completely wrong on this, it's been a while since I got DIH going).
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Indexing-Problem-Where-s-my-data-tp1000660p1000843.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



--
Lance Norskog
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Indexing Problem: Where's my data?

Chantal Ackermann
make sure to set stored="true" on every field you expect to be returned
in your results for later display.

Chantal


Reply | Threaded
Open this post in threaded view
|

RE: Indexing Problem: Where's my data?

Michael Griffiths
In reply to this post by Lance Norskog-2
Thanks - but my schema.xml is not recognizing field names specified in the data-config.xml.

For example - and I just tested this now - if I have in my data-config.xml:

<field column="product_id" name="pid" />

And then in my schema.xml:

<field name="pid" type="int" indexed="true" stored="true" required="true" />

Then no documents are processed (e.g. I get rows queried, but <str name="Total Documents Processed">0</str> in the data handler UI).

But if I change that to:

<field name="product_id" type="int" indexed="true" stored="true" required="true" />

... now documents are processed (e.g. <str name="Total Documents Processed">313</str>).

Which, quite frankly, confuses me. I may be doing something else wrong (I changed my SQL as well, so I'm getting another failure, but I think it's separate to this one).

-----Original Message-----
From: Lance Norskog [mailto:[hidden email]]
Sent: Tuesday, July 27, 2010 8:25 PM
To: [hidden email]
Subject: Re: Indexing Problem: Where's my data?

Solr respects case for field names.  Database fields are supplied in lower-case, so it should be 'attribute_name' and 'string_value'. Also 'product_id', etc.

It is easier if you carefully emulate every detail in the examples, for example lower-case names.

On Tue, Jul 27, 2010 at 2:59 PM, kenf_nc <[hidden email]> wrote:

>
> for STRING_VALUE, I assume there is a property in the 'select *'
> results called string_value? if so I'm not sure why it wouldn't work.
> If not, then that's why, it doesn't have anything to put there.
>
> For ATTRIBUTE_NAME, is it possibly a case issue? you called it
> 'Attribute_Name' in your query, but ATTRIBUTE_NAME in your
> schema...just something to check I guess.
>
> Also, not sure why you are using name= in your fields, for example,
> <field column="PARENT_FAMILY" name="Parent Family" /> I thought
> 'column' was the source field name and 'name' was supposed to be the
> schema field name and if not there it would assume 'column' name. You
> don't have a schema field called "Parent Family" so it looks like it's
> defaulting to column name too which is lucky for you I suppose. But
> you may want to either remove 'name=' or make it match the schema.
> (and I may be completely wrong on this, it's been a while since I got DIH going).
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-Problem-Where-s-my-data-tp
> 1000660p1000843.html Sent from the Solr - User mailing list archive at
> Nabble.com.
>



--
Lance Norskog
[hidden email]