Need Help on Indexing and Retrieval Strategy

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Need Help on Indexing and Retrieval Strategy

Jamie Orchard-Hays-4
I have large TEI.2 docs that I am indexing. These are journal issuess
with all the typical sections in them. The main unit of organization
and interest is an article which I want to retrieve intact and display
(no problem there).

The question I'm working on is whether to index each article as it's
own Solr document in addition to the Solr document for the articles'
issue, or to just use a multiValued article field in the issue's Solr
doc. The disadvantage of the multiValued article field is that
whenever I just want to retrieve one complete article, I actually
retrieve all of them--a lot of data.

Thanks for any suggestions,

Jamie
Reply | Threaded
Open this post in threaded view
|

Re: Need Help on Indexing and Retrieval Strategy

Yonik Seeley-2
On 4/25/07, Jamie Orchard-Hays <[hidden email]> wrote:

> I have large TEI.2 docs that I am indexing. These are journal issuess
> with all the typical sections in them. The main unit of organization
> and interest is an article which I want to retrieve intact and display
> (no problem there).
>
> The question I'm working on is whether to index each article as it's
> own Solr document in addition to the Solr document for the articles'
> issue, or to just use a multiValued article field in the issue's Solr
> doc. The disadvantage of the multiValued article field is that
> whenever I just want to retrieve one complete article, I actually
> retrieve all of them--a lot of data.

For full-text search, I'd definitely go for the former (a document per article).
If someone does a query, I assume they would want *articles* sorted by
relevance, and not issues?

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Need Help on Indexing and Retrieval Strategy

Jamie Orchard-Hays-4
Thanks, Yonik. I was leaning that way. I'm trying to make something a
bit like a simple XTF browser and trying to understand the best way to
index and access by journal issues. (RoR with Solr).

Jamie

On 4/25/07, Yonik Seeley <[hidden email]> wrote:

> On 4/25/07, Jamie Orchard-Hays <[hidden email]> wrote:
> > I have large TEI.2 docs that I am indexing. These are journal issuess
> > with all the typical sections in them. The main unit of organization
> > and interest is an article which I want to retrieve intact and display
> > (no problem there).
> >
> > The question I'm working on is whether to index each article as it's
> > own Solr document in addition to the Solr document for the articles'
> > issue, or to just use a multiValued article field in the issue's Solr
> > doc. The disadvantage of the multiValued article field is that
> > whenever I just want to retrieve one complete article, I actually
> > retrieve all of them--a lot of data.
>
> For full-text search, I'd definitely go for the former (a document per article).
> If someone does a query, I assume they would want *articles* sorted by
> relevance, and not issues?
>
> -Yonik
>