Composite key for uniqueKeyId

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Composite key for uniqueKeyId

Jon Baer
Hi,

Im interested to know if composite keys are now possible or if there  
is anything to copyField I can use to get composite keys working for  
my doc ids?

Thanks.

-snip-
support for composite keys ... either with some explicit change to the  
`<uniqueKey>`
declaration or perhaps just copyField with some hidden magic that  
concats the resulting terms
into a single key Term
-snip-

- Jon
Reply | Threaded
Open this post in threaded view
|

Re: Composite key for uniqueKeyId

Norberto Meijome-2
On Thu, 6 Mar 2008 11:33:38 -0500
Jon Baer <[hidden email]> wrote:

> Im interested to know if composite keys are now possible or if there  
> is anything to copyField I can use to get composite keys working for  
> my doc ids?

FWIW, we just do this @ doc generation time - grab several fields, massage them into shape, normalise, assign to docID
B
_________________________
{Beto|Norberto|Numard} Meijome

...using the internet as it was originally intended... for the further research of pornography and pipebombs.

I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Reply | Threaded
Open this post in threaded view
|

Re: Composite key for uniqueKeyId

Jon Baer
Hi Norberto,

This sounds exactly what Im looking to do, do you have an example?

(Keep in mind Im using data-config.xml - DataImporter)

Im interested in merging different types of content in, ie:

NEWS12345
VIDEO12345

So Id like to end up w/ different keys per type if possible.

Thanks.

- Jon

On Mar 6, 2008, at 11:21 PM, Norberto Meijome wrote:

> On Thu, 6 Mar 2008 11:33:38 -0500
> Jon Baer <[hidden email]> wrote:
>
>> Im interested to know if composite keys are now possible or if there
>> is anything to copyField I can use to get composite keys working for
>> my doc ids?
>
> FWIW, we just do this @ doc generation time - grab several fields,  
> massage them into shape, normalise, assign to docID
> B
> _________________________
> {Beto|Norberto|Numard} Meijome
>
> ...using the internet as it was originally intended... for the  
> further research of pornography and pipebombs.
>
> I speak for myself, not my employer. Contents may be hot. Slippery  
> when wet. Reading disclaimers makes you go blind. Writing them is  
> worse. You have been Warned.

Reply | Threaded
Open this post in threaded view
|

Re: Composite key for uniqueKeyId

hossman

I believe Norberto ment he was handling it in his update client code --
before sending the docs to Solr.

Something that *seems* possible but I've never actaully tried is writting
a "ConcatTokenFilterFactory" that queues up all the tokens and joins
them together (using some confiured string, defaulting to "")  then you
could in theory do something like this...

    <fieldType name="compositeKeyType" class="solr.TextField" omitNorms="true">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.ConcatTokenFilterFactory" delim="-"/>
      <analyzer>
    </fieldType>
    ...
    <field name="compositeKey" type="compositeKeyType" />
    <uniqueKey>compositeKey</uniqueKey>
    ...
    <copyField source="type"  dest="compositeKey"/>
    <copyField source="numId" dest="compositeKey"/>
    ...

that *might* work ... but things would be a little weird when viewing your
results (compositeKey would have to be multivalued, and it would return as
an array)


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Composite key for uniqueKeyId

Jon Baer
That definitely sounds like the proper way to go + will try.  Im not  
too concerned w/ my keys coming back just that I can't seem to run the  
DataImportHandler w/o one.

I was able to temporarily get around it by returning it in the entity  
query.  Ie:

<entity query="select concat(col1,col2,col3,col4) as id">
   <field name="id" column="id" />
</entity>

BTW, the DataImportHandler seems to still be a "patch", is there an  
estimation of if/when it will appear in trunk?

Thanks!

- Jon

On Mar 7, 2008, at 8:59 PM, Chris Hostetter wrote:

>
> I believe Norberto ment he was handling it in his update client code  
> --
> before sending the docs to Solr.
>
> Something that *seems* possible but I've never actaully tried is  
> writting
> a "ConcatTokenFilterFactory" that queues up all the tokens and joins
> them together (using some confiured string, defaulting to "")  then  
> you
> could in theory do something like this...
>
>    <fieldType name="compositeKeyType" class="solr.TextField"  
> omitNorms="true">
>      <analyzer>
>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>        <filter class="solr.ConcatTokenFilterFactory" delim="-"/>
>      <analyzer>
>    </fieldType>
>    ...
>    <field name="compositeKey" type="compositeKeyType" />
>    <uniqueKey>compositeKey</uniqueKey>
>    ...
>    <copyField source="type"  dest="compositeKey"/>
>    <copyField source="numId" dest="compositeKey"/>
>    ...
>
> that *might* work ... but things would be a little weird when  
> viewing your
> results (compositeKey would have to be multivalued, and it would  
> return as
> an array)
>
>
> -Hoss
>

Reply | Threaded
Open this post in threaded view
|

Re: Composite key for uniqueKeyId

Noble Paul നോബിള്‍  नोब्ळ्
Good to hear that people are using DatImportHandler
In a couple of days, we are giving another patch which is cleared by
our QA  with
better error handling, messaging and a lot of new features.

A committer will have to decide on when it is good enough to be committed
--Noble

On Sat, Mar 8, 2008 at 10:11 AM, Jon Baer <[hidden email]> wrote:

> That definitely sounds like the proper way to go + will try.  Im not
>  too concerned w/ my keys coming back just that I can't seem to run the
>  DataImportHandler w/o one.
>
>  I was able to temporarily get around it by returning it in the entity
>  query.  Ie:
>
>  <entity query="select concat(col1,col2,col3,col4) as id">
>    <field name="id" column="id" />
>  </entity>
>
>  BTW, the DataImportHandler seems to still be a "patch", is there an
>  estimation of if/when it will appear in trunk?
>
>  Thanks!
>
>  - Jon
>
>
>
>  On Mar 7, 2008, at 8:59 PM, Chris Hostetter wrote:
>
>  >
>  > I believe Norberto ment he was handling it in his update client code
>  > --
>  > before sending the docs to Solr.
>  >
>  > Something that *seems* possible but I've never actaully tried is
>  > writting
>  > a "ConcatTokenFilterFactory" that queues up all the tokens and joins
>  > them together (using some confiured string, defaulting to "")  then
>  > you
>  > could in theory do something like this...
>  >
>  >    <fieldType name="compositeKeyType" class="solr.TextField"
>  > omitNorms="true">
>  >      <analyzer>
>  >        <tokenizer class="solr.KeywordTokenizerFactory"/>
>  >        <filter class="solr.ConcatTokenFilterFactory" delim="-"/>
>  >      <analyzer>
>  >    </fieldType>
>  >    ...
>  >    <field name="compositeKey" type="compositeKeyType" />
>  >    <uniqueKey>compositeKey</uniqueKey>
>  >    ...
>  >    <copyField source="type"  dest="compositeKey"/>
>  >    <copyField source="numId" dest="compositeKey"/>
>  >    ...
>  >
>  > that *might* work ... but things would be a little weird when
>  > viewing your
>  > results (compositeKey would have to be multivalued, and it would
>  > return as
>  > an array)
>  >
>  >
>  > -Hoss
>  >
>
>



--
--Noble Paul
Reply | Threaded
Open this post in threaded view
|

Re: Composite key for uniqueKeyId

Vijay Rao-3
In reply to this post by Jon Baer
I am also looking forward to get this checked into the trunk.

Will there be a patch with Solr1.2 support?
Cheers
Vijay

On Sat, Mar 8, 2008 at 10:11 AM, Jon Baer <[hidden email]> wrote:

> That definitely sounds like the proper way to go + will try.  Im not
> too concerned w/ my keys coming back just that I can't seem to run the
> DataImportHandler w/o one.
>
> I was able to temporarily get around it by returning it in the entity
> query.  Ie:
>
> <entity query="select concat(col1,col2,col3,col4) as id">
>   <field name="id" column="id" />
> </entity>
>
> BTW, the DataImportHandler seems to still be a "patch", is there an
> estimation of if/when it will appear in trunk?
>
> Thanks!
>
> - Jon
>
> On Mar 7, 2008, at 8:59 PM, Chris Hostetter wrote:
>
> >
> > I believe Norberto ment he was handling it in his update client code
> > --
> > before sending the docs to Solr.
> >
> > Something that *seems* possible but I've never actaully tried is
> > writting
> > a "ConcatTokenFilterFactory" that queues up all the tokens and joins
> > them together (using some confiured string, defaulting to "")  then
> > you
> > could in theory do something like this...
> >
> >    <fieldType name="compositeKeyType" class="solr.TextField"
> > omitNorms="true">
> >      <analyzer>
> >        <tokenizer class="solr.KeywordTokenizerFactory"/>
> >        <filter class="solr.ConcatTokenFilterFactory" delim="-"/>
> >      <analyzer>
> >    </fieldType>
> >    ...
> >    <field name="compositeKey" type="compositeKeyType" />
> >    <uniqueKey>compositeKey</uniqueKey>
> >    ...
> >    <copyField source="type"  dest="compositeKey"/>
> >    <copyField source="numId" dest="compositeKey"/>
> >    ...
> >
> > that *might* work ... but things would be a little weird when
> > viewing your
> > results (compositeKey would have to be multivalued, and it would
> > return as
> > an array)
> >
> >
> > -Hoss
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Composite key for uniqueKeyId

Erik Hatcher
The best thing folks can do to help with getting patches like this  
important DataImporterHandler committed to trunk is to try it out,  
report back experiences, and offer suggestions for improvement.

Solr 1.3 will come in _good_ time, but not before its time.  There  
are many substantial changes in Solr between 1.2 and trunk and some  
more slated.  Knocking out any of these gets us closer to the release  
as well:

<http://issues.apache.org/jira/secure/IssueNavigator.jspa?sorter/ 
field=status&sorter/order=DESC>

        Erik



On Mar 8, 2008, at 2:48 AM, Vijay Rao wrote:

> I am also looking forward to get this checked into the trunk.
>
> Will there be a patch with Solr1.2 support?
> Cheers
> Vijay
>
> On Sat, Mar 8, 2008 at 10:11 AM, Jon Baer <[hidden email]> wrote:
>
>> That definitely sounds like the proper way to go + will try.  Im not
>> too concerned w/ my keys coming back just that I can't seem to run  
>> the
>> DataImportHandler w/o one.
>>
>> I was able to temporarily get around it by returning it in the entity
>> query.  Ie:
>>
>> <entity query="select concat(col1,col2,col3,col4) as id">
>>   <field name="id" column="id" />
>> </entity>
>>
>> BTW, the DataImportHandler seems to still be a "patch", is there an
>> estimation of if/when it will appear in trunk?
>>
>> Thanks!
>>
>> - Jon
>>
>> On Mar 7, 2008, at 8:59 PM, Chris Hostetter wrote:
>>
>>>
>>> I believe Norberto ment he was handling it in his update client code
>>> --
>>> before sending the docs to Solr.
>>>
>>> Something that *seems* possible but I've never actaully tried is
>>> writting
>>> a "ConcatTokenFilterFactory" that queues up all the tokens and joins
>>> them together (using some confiured string, defaulting to "")  then
>>> you
>>> could in theory do something like this...
>>>
>>>    <fieldType name="compositeKeyType" class="solr.TextField"
>>> omitNorms="true">
>>>      <analyzer>
>>>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>>>        <filter class="solr.ConcatTokenFilterFactory" delim="-"/>
>>>      <analyzer>
>>>    </fieldType>
>>>    ...
>>>    <field name="compositeKey" type="compositeKeyType" />
>>>    <uniqueKey>compositeKey</uniqueKey>
>>>    ...
>>>    <copyField source="type"  dest="compositeKey"/>
>>>    <copyField source="numId" dest="compositeKey"/>
>>>    ...
>>>
>>> that *might* work ... but things would be a little weird when
>>> viewing your
>>> results (compositeKey would have to be multivalued, and it would
>>> return as
>>> an array)
>>>
>>>
>>> -Hoss
>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Composite key for uniqueKeyId

Noble Paul നോബിള്‍  नोब्ळ्
hi ,
The tool is undergoing substantial testing in our QA department .
Because it is an official internal project also, the bugs are filed in
our bug tool. We are fixing them as and when they are reported. It has
gone through some good iterations and it is going to power the backend
for a 2 of our products which are going to come out in a month's time.
(More in the pipeline).

Internally it has already had a 1.0  release. The next patch is going
to contain the 1.0 release + a few extra features.

We are testing with a dataset of ~3 million documents . Each document
is built by joining around 6 tables.

This is not to say that it is free of bugs. Please do the testing and
report back any bugs and we will be glad to incorporate the fixes in
the next patch.

--Noble

On Sat, Mar 8, 2008 at 4:10 PM, Erik Hatcher <[hidden email]> wrote:

> The best thing folks can do to help with getting patches like this
>  important DataImporterHandler committed to trunk is to try it out,
>  report back experiences, and offer suggestions for improvement.
>
>  Solr 1.3 will come in _good_ time, but not before its time.  There
>  are many substantial changes in Solr between 1.2 and trunk and some
>  more slated.  Knocking out any of these gets us closer to the release
>  as well:
>
>  <http://issues.apache.org/jira/secure/IssueNavigator.jspa?sorter/
>  field=status&sorter/order=DESC>
>
>         Erik
>
>
>
>
>
>  On Mar 8, 2008, at 2:48 AM, Vijay Rao wrote:
>  > I am also looking forward to get this checked into the trunk.
>  >
>  > Will there be a patch with Solr1.2 support?
>  > Cheers
>  > Vijay
>  >
>  > On Sat, Mar 8, 2008 at 10:11 AM, Jon Baer <[hidden email]> wrote:
>  >
>  >> That definitely sounds like the proper way to go + will try.  Im not
>  >> too concerned w/ my keys coming back just that I can't seem to run
>  >> the
>  >> DataImportHandler w/o one.
>  >>
>  >> I was able to temporarily get around it by returning it in the entity
>  >> query.  Ie:
>  >>
>  >> <entity query="select concat(col1,col2,col3,col4) as id">
>  >>   <field name="id" column="id" />
>  >> </entity>
>  >>
>  >> BTW, the DataImportHandler seems to still be a "patch", is there an
>  >> estimation of if/when it will appear in trunk?
>  >>
>  >> Thanks!
>  >>
>  >> - Jon
>  >>
>  >> On Mar 7, 2008, at 8:59 PM, Chris Hostetter wrote:
>  >>
>  >>>
>  >>> I believe Norberto ment he was handling it in his update client code
>  >>> --
>  >>> before sending the docs to Solr.
>  >>>
>  >>> Something that *seems* possible but I've never actaully tried is
>  >>> writting
>  >>> a "ConcatTokenFilterFactory" that queues up all the tokens and joins
>  >>> them together (using some confiured string, defaulting to "")  then
>  >>> you
>  >>> could in theory do something like this...
>  >>>
>  >>>    <fieldType name="compositeKeyType" class="solr.TextField"
>  >>> omitNorms="true">
>  >>>      <analyzer>
>  >>>        <tokenizer class="solr.KeywordTokenizerFactory"/>
>  >>>        <filter class="solr.ConcatTokenFilterFactory" delim="-"/>
>  >>>      <analyzer>
>  >>>    </fieldType>
>  >>>    ...
>  >>>    <field name="compositeKey" type="compositeKeyType" />
>  >>>    <uniqueKey>compositeKey</uniqueKey>
>  >>>    ...
>  >>>    <copyField source="type"  dest="compositeKey"/>
>  >>>    <copyField source="numId" dest="compositeKey"/>
>  >>>    ...
>  >>>
>  >>> that *might* work ... but things would be a little weird when
>  >>> viewing your
>  >>> results (compositeKey would have to be multivalued, and it would
>  >>> return as
>  >>> an array)
>  >>>
>  >>>
>  >>> -Hoss
>  >>>
>  >>
>  >>
>
>



--
--Noble Paul
Reply | Threaded
Open this post in threaded view
|

Re: Composite key for uniqueKeyId

Vijay Rao-3
I used it to index my DB and I found no bugs .Ours is a very simple usecase.

There were rough edges though. The error logging and messages were not up to
the mark. It aborted the entire indexing when there was a missing
'required'  field. It must just skip that document. Or give me an opotion to
configure that.

I am waiting for the next patch to file the bugs.

It gives me some confidence to know that this tool is powering AOL's
infrastructure.

Cheers
Vijay

On Sat, Mar 8, 2008 at 6:48 PM, Noble Paul നോബിള്‍ नोब्ळ् <
[hidden email]> wrote:

> hi ,
> The tool is undergoing substantial testing in our QA department .
> Because it is an official internal project also, the bugs are filed in
> our bug tool. We are fixing them as and when they are reported. It has
> gone through some good iterations and it is going to power the backend
> for a 2 of our products which are going to come out in a month's time.
> (More in the pipeline).
>
> Internally it has already had a 1.0  release. The next patch is going
> to contain the 1.0 release + a few extra features.
>
> We are testing with a dataset of ~3 million documents . Each document
> is built by joining around 6 tables.
>
> This is not to say that it is free of bugs. Please do the testing and
> report back any bugs and we will be glad to incorporate the fixes in
> the next patch.
>
> --Noble
>
> On Sat, Mar 8, 2008 at 4:10 PM, Erik Hatcher <[hidden email]>
> wrote:
> > The best thing folks can do to help with getting patches like this
> >  important DataImporterHandler committed to trunk is to try it out,
> >  report back experiences, and offer suggestions for improvement.
> >
> >  Solr 1.3 will come in _good_ time, but not before its time.  There
> >  are many substantial changes in Solr between 1.2 and trunk and some
> >  more slated.  Knocking out any of these gets us closer to the release
> >  as well:
> >
> >  <http://issues.apache.org/jira/secure/IssueNavigator.jspa?sorter/
> >  field=status&sorter/order=DESC>
> >
> >         Erik
> >
> >
> >
> >
> >
> >  On Mar 8, 2008, at 2:48 AM, Vijay Rao wrote:
> >  > I am also looking forward to get this checked into the trunk.
> >  >
> >  > Will there be a patch with Solr1.2 support?
> >  > Cheers
> >  > Vijay
> >  >
> >  > On Sat, Mar 8, 2008 at 10:11 AM, Jon Baer <[hidden email]> wrote:
> >  >
> >  >> That definitely sounds like the proper way to go + will try.  Im not
> >  >> too concerned w/ my keys coming back just that I can't seem to run
> >  >> the
> >  >> DataImportHandler w/o one.
> >  >>
> >  >> I was able to temporarily get around it by returning it in the
> entity
> >  >> query.  Ie:
> >  >>
> >  >> <entity query="select concat(col1,col2,col3,col4) as id">
> >  >>   <field name="id" column="id" />
> >  >> </entity>
> >  >>
> >  >> BTW, the DataImportHandler seems to still be a "patch", is there an
> >  >> estimation of if/when it will appear in trunk?
> >  >>
> >  >> Thanks!
> >  >>
> >  >> - Jon
> >  >>
> >  >> On Mar 7, 2008, at 8:59 PM, Chris Hostetter wrote:
> >  >>
> >  >>>
> >  >>> I believe Norberto ment he was handling it in his update client
> code
> >  >>> --
> >  >>> before sending the docs to Solr.
> >  >>>
> >  >>> Something that *seems* possible but I've never actaully tried is
> >  >>> writting
> >  >>> a "ConcatTokenFilterFactory" that queues up all the tokens and
> joins
> >  >>> them together (using some confiured string, defaulting to "")  then
> >  >>> you
> >  >>> could in theory do something like this...
> >  >>>
> >  >>>    <fieldType name="compositeKeyType" class="solr.TextField"
> >  >>> omitNorms="true">
> >  >>>      <analyzer>
> >  >>>        <tokenizer class="solr.KeywordTokenizerFactory"/>
> >  >>>        <filter class="solr.ConcatTokenFilterFactory" delim="-"/>
> >  >>>      <analyzer>
> >  >>>    </fieldType>
> >  >>>    ...
> >  >>>    <field name="compositeKey" type="compositeKeyType" />
> >  >>>    <uniqueKey>compositeKey</uniqueKey>
> >  >>>    ...
> >  >>>    <copyField source="type"  dest="compositeKey"/>
> >  >>>    <copyField source="numId" dest="compositeKey"/>
> >  >>>    ...
> >  >>>
> >  >>> that *might* work ... but things would be a little weird when
> >  >>> viewing your
> >  >>> results (compositeKey would have to be multivalued, and it would
> >  >>> return as
> >  >>> an array)
> >  >>>
> >  >>>
> >  >>> -Hoss
> >  >>>
> >  >>
> >  >>
> >
> >
>
>
>
> --
> --Noble Paul
>
Reply | Threaded
Open this post in threaded view
|

Re: Composite key for uniqueKeyId

Norberto Meijome-2
In reply to this post by hossman
On Fri, 7 Mar 2008 17:59:48 -0800 (PST)
Chris Hostetter <[hidden email]> wrote:

> I believe Norberto ment he was handling it in his update client code --
> before sending the docs to Solr.

Indeed, this what we do. We have a process that parses certain files, generates
documents following the SOLR schema in use and publishes them to the index.
This process is the one that generates the DocID based on other fields.

>
> Something that *seems* possible but I've never actaully tried is writting
> a "ConcatTokenFilterFactory" that queues up all the tokens and joins
> them together (using some confiured string, defaulting to "")  then you
> could in theory do something like this...

yeah, i never tried this because we need somewhat more complex calculations to
be done for DocId.

cheers,
B

_________________________
{Beto|Norberto|Numard} Meijome

"It's not what you do, it's the love you put into it."
   Mother Theresa.

I speak for myself, not my employer. Contents may be hot. Slippery when wet.
Reading disclaimers makes you go blind. Writing them is worse. You have been
Warned.