Lucene/Digester

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene/Digester

Malcolm Clark
Hi all,
I'm using Lucene/Digester etc for my MSc I'm quite new to these API's. I'm trying to obtain advice but it's hard to say whether the problem is Lucene or Digester.
Firstly:
I am trying to index the INEX collection but when I try to index repetitive elements only the last one is indexed. For example:
<Book>
<Name>
<Title>
<Chapter></Chapter>
<Chapter></Chapter>
<Chapter></Chapter> //this is the only one indexed
</Title>
</Name>
</Book>
only the last Chapter element will be indexed and it will skip the first two.
Secondly:
When using the Digester/Lucene with XML does each file have to contain e.g
<!DOCTYPE books PUBLIC "-//LBIN//DTD IEEE Mag//EN" "xmlarticle.dtd" or is
there a way around it?
 I have tried to use the sample line from the Digester API
digester.register("-//Example Dot Com //DTD Sample Example//EN",  "assets/sample.dtd");
but to no avail.

Thanks very much. I really appreciate any possible solutions as I'm stumped.
Malcolm
Scotland
Reply | Threaded
Open this post in threaded view
|

Re: Lucene/Digester

Oren Shir
Hi,
About your first question:
How do you know that only the last "Chapter" element is stored? could it be
that you are only getting the last one? Try using getValues() instead of
get.

Oren

On 10/16/05, Malcolm Clark <[hidden email]> wrote:

>
> Hi all,
> I'm using Lucene/Digester etc for my MSc I'm quite new to these API's. I'm
> trying to obtain advice but it's hard to say whether the problem is Lucene
> or Digester.
> Firstly:
> I am trying to index the INEX collection but when I try to index
> repetitive elements only the last one is indexed. For example:
> <Book>
> <Name>
> <Title>
> <Chapter></Chapter>
> <Chapter></Chapter>
> <Chapter></Chapter> //this is the only one indexed
> </Title>
> </Name>
> </Book>
> only the last Chapter element will be indexed and it will skip the first
> two.
> Secondly:
> When using the Digester/Lucene with XML does each file have to contain e.g
> <!DOCTYPE books PUBLIC "-//LBIN//DTD IEEE Mag//EN" "xmlarticle.dtd" or is
> there a way around it?
> I have tried to use the sample line from the Digester API
> digester.register("-//Example Dot Com //DTD Sample Example//EN",
> "assets/sample.dtd");
> but to no avail.
>
> Thanks very much. I really appreciate any possible solutions as I'm
> stumped.
> Malcolm
> Scotland
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Lucene/Digester

Malcolm Clark

Hi

I used Luke to check the content of the index and they are not there.

cheers,

MC

Reply | Threaded
Open this post in threaded view
|

Re: Lucene/Digester

Grant Ingersoll
In reply to this post by Oren Shir
I would suggest isolating Digester from Lucene and running a unit test
on just the Digester part to ensure you are getting the right things
back from Digester.  Once you are sure your digester setup is correct,
then look at how it feeds into Lucene and post back here if it still
doesn't work

Oren Shir wrote:

>Hi,
>About your first question:
>How do you know that only the last "Chapter" element is stored? could it be
>that you are only getting the last one? Try using getValues() instead of
>get.
>
>Oren
>
>On 10/16/05, Malcolm Clark <[hidden email]> wrote:
>  
>
>>Hi all,
>>I'm using Lucene/Digester etc for my MSc I'm quite new to these API's. I'm
>>trying to obtain advice but it's hard to say whether the problem is Lucene
>>or Digester.
>>Firstly:
>>I am trying to index the INEX collection but when I try to index
>>repetitive elements only the last one is indexed. For example:
>><Book>
>><Name>
>><Title>
>><Chapter></Chapter>
>><Chapter></Chapter>
>><Chapter></Chapter> //this is the only one indexed
>></Title>
>></Name>
>></Book>
>>only the last Chapter element will be indexed and it will skip the first
>>two.
>>Secondly:
>>When using the Digester/Lucene with XML does each file have to contain e.g
>><!DOCTYPE books PUBLIC "-//LBIN//DTD IEEE Mag//EN" "xmlarticle.dtd" or is
>>there a way around it?
>>I have tried to use the sample line from the Digester API
>>digester.register("-//Example Dot Com //DTD Sample Example//EN",
>>"assets/sample.dtd");
>>but to no avail.
>>
>>Thanks very much. I really appreciate any possible solutions as I'm
>>stumped.
>>Malcolm
>>Scotland
>>
>>
>>    
>>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene/Digester

Malcolm Clark

Okay I'll do that. Thanks very much for the advice as it's much appreciated.

Malcolm Clark