Getting the terms for a particular field.

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting the terms for a particular field.

Lennart Regebro-2
Hi!

I'm trying to get out all of the terms in a field. More specifically,
I'm trying to get a complete list of the UIDs I have indexed.

The best I have come up with so far is to go through all the terms
gotten from the IndexReader.terms() and filter on field(). That works
but fields kinda silly, but I don't know enough of Lucenes internals
to know if it is silly or not. ;-)

I'm using NX/Lucene/PyLucene is that makes a difference.

--
Lennart Regebro, Nuxeo     http://www.nuxeo.com/
CPS Content Management     http://www.nuxeo.org/
Reply | Threaded
Open this post in threaded view
|

Re: Getting the terms for a particular field.

Chris Hostetter-3

: The best I have come up with so far is to go through all the terms
: gotten from the IndexReader.terms() and filter on field(). That works

you're basically on it, but look at the IndexReader.terms(Term) method
which allows you to start with a specific term, and then bear in mind that
the TermEnum goes in order, which means all of the terms for a single
field will come sequentially, so as soon as you see a field name other
then the one you are interested in, you know you can stop.

if you look at the code for RangeFilter you'll see a good example of
iterating over a TermEnum for a single field ... what you want is
effectively the same the work RangeFilter would do when the bounds are
both null.



-Hoss

Reply | Threaded
Open this post in threaded view
|

How to create fields from a txt file for Lucene indexing?

Eder-3
Hi all

I'd like to create fields based in a txt.file, like the foollowing example:

File1.txt
Author: Eder
Description: Indexing txt files in Lucene Tutorial
Category: Software Development

File2.txt
Author: Cecilia
Title: Preventioning Fever
Category: Health y Wellness

So, I'd like to create the fields "Author", "Description", "Title" and
"Category" by reading the files. If I got the texts, I would do something
like:

Document doc = new Document( );
doc.add(New field("Author","Eder"));

But this info is in txt files, so how can I read the file and get the data?


Great Hugh,

Eder Rebouças dos Santos
Salvador / BA - Brasil

Reply | Threaded
Open this post in threaded view
|

Re: Getting the terms for a particular field.

Lennart Regebro-2
In reply to this post by Chris Hostetter-3
On 10/25/06, Chris Hostetter <[hidden email]> wrote:

>
> : The best I have come up with so far is to go through all the terms
> : gotten from the IndexReader.terms() and filter on field(). That works
>
> you're basically on it, but look at the IndexReader.terms(Term) method
> which allows you to start with a specific term, and then bear in mind that
> the TermEnum goes in order, which means all of the terms for a single
> field will come sequentially, so as soon as you see a field name other
> then the one you are interested in, you know you can stop.
>
> if you look at the code for RangeFilter you'll see a good example of
> iterating over a TermEnum for a single field ... what you want is
> effectively the same the work RangeFilter would do when the bounds are
> both null.

That works fine, thanks for the help!

--
Lennart Regebro, Nuxeo     http://www.nuxeo.com/
CPS Content Management     http://www.nuxeo.org/
Reply | Threaded
Open this post in threaded view
|

Re: How to create fields from a txt file for Lucene indexing?

Grant Ingersoll-2
In reply to this post by Eder-3
You need to read in the file and parse it according to your business  
rules (just like you would read in any file in your system) and then  
create the appropriate Fields.

-Grant
On Oct 26, 2006, at 11:56 PM, Eder wrote:

> Hi all
>
> I'd like to create fields based in a txt.file, like the foollowing  
> example:
>
> File1.txt
> Author: Eder
> Description: Indexing txt files in Lucene Tutorial
> Category: Software Development
>
> File2.txt
> Author: Cecilia
> Title: Preventioning Fever
> Category: Health y Wellness
>
> So, I'd like to create the fields "Author", "Description", "Title"  
> and "Category" by reading the files. If I got the texts, I would do  
> something like:
>
> Document doc = new Document( );
> doc.add(New field("Author","Eder"));
>
> But this info is in txt files, so how can I read the file and get  
> the data?
>
>
> Great Hugh,
>
> Eder Rebouças dos Santos
> Salvador / BA - Brasil

--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org



Reply | Threaded
Open this post in threaded view
|

Re: How to create fields from a txt file for Lucene indexing?

Eder-3

Hi, Grant

Sorry for writing for ya... I'm a newbie in Lucene using. Could you give me
a practical example for parsing a file? I tried to comprehend the luceneweb
demo, but it's very complicated..

I'd thank ya a lot!

Eder


----- Original Message -----
From: "Grant Ingersoll" <[hidden email]>
To: <[hidden email]>
Sent: Friday, October 27, 2006 10:43 AM
Subject: Re: How to create fields from a txt file for Lucene indexing?


You need to read in the file and parse it according to your business
rules (just like you would read in any file in your system) and then
create the appropriate Fields.

-Grant
On Oct 26, 2006, at 11:56 PM, Eder wrote:

> Hi all
>
> I'd like to create fields based in a txt.file, like the foollowing
> example:
>
> File1.txt
> Author: Eder
> Description: Indexing txt files in Lucene Tutorial
> Category: Software Development
>
> File2.txt
> Author: Cecilia
> Title: Preventioning Fever
> Category: Health y Wellness
>
> So, I'd like to create the fields "Author", "Description", "Title"  and
> "Category" by reading the files. If I got the texts, I would do  something
> like:
>
> Document doc = new Document( );
> doc.add(New field("Author","Eder"));
>
> But this info is in txt files, so how can I read the file and get  the
> data?
>
>
> Great Hugh,
>
> Eder Rebouças dos Santos
> Salvador / BA - Brasil

--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org



Reply | Threaded
Open this post in threaded view
|

Re: How to create fields from a txt file for Lucene indexing?

Patrek
Hi Eder,

If you are using Java 5, take a look at

java.util.Scanner to read your lines,
then use String
<http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html>[]
split(String <http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html>
regex) to split on column,
and read the first element of the array to decide what field you have.

Hope this helps.

Patrick


On 10/27/06, Eder <[hidden email]> wrote:

>
>
> Hi, Grant
>
> Sorry for writing for ya... I'm a newbie in Lucene using. Could you give
> me
> a practical example for parsing a file? I tried to comprehend the
> luceneweb
> demo, but it's very complicated..
>
> I'd thank ya a lot!
>
> Eder
>
>
> ----- Original Message -----
> From: "Grant Ingersoll" <[hidden email]>
> To: <[hidden email]>
> Sent: Friday, October 27, 2006 10:43 AM
> Subject: Re: How to create fields from a txt file for Lucene indexing?
>
>
> You need to read in the file and parse it according to your business
> rules (just like you would read in any file in your system) and then
> create the appropriate Fields.
>
> -Grant
> On Oct 26, 2006, at 11:56 PM, Eder wrote:
>
> > Hi all
> >
> > I'd like to create fields based in a txt.file, like the foollowing
> > example:
> >
> > File1.txt
> > Author: Eder
> > Description: Indexing txt files in Lucene Tutorial
> > Category: Software Development
> >
> > File2.txt
> > Author: Cecilia
> > Title: Preventioning Fever
> > Category: Health y Wellness
> >
> > So, I'd like to create the fields "Author", "Description", "Title"  and
> > "Category" by reading the files. If I got the texts, I would
> do  something
> > like:
> >
> > Document doc = new Document( );
> > doc.add(New field("Author","Eder"));
> >
> > But this info is in txt files, so how can I read the file and get  the
> > data?
> >
> >
> > Great Hugh,
> >
> > Eder Rebouças dos Santos
> > Salvador / BA - Brasil
>
> --------------------------
> Grant Ingersoll
> Sr. Software Engineer
> Center for Natural Language Processing
> Syracuse University
> 335 Hinds Hall
> Syracuse, NY 13244
> http://www.cnlp.org
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Searching dates at Lucene 1.4.3 webindex

Eder-3
Hi all,

First, thanks a lot Patrick and Grant, you were very very helpful in my last
question! Here comes the final code. The file should have ":" as delimiters
at the files to identify the fields, to work. e.g.:

File.txt
:Title: Web searching in Lucene
:Description: This article presents the results of using a webservice
running Lucene search
:Author: Eder Reboucas
:Content: bla bla bla... etc. e tal


try {
    Scanner scanner = new Scanner(f);
    scanner.useDelimiter("\\s*:\\s*"); // whitespace and ":"
    String campo_nome = new String();
    String campo_cont = new String();
    while (scanner.hasNext()) {
        campo_nome = (campo_nome + scanner.next());
        campo_cont = (campo_cont + scanner.next());
        doc.add(new Field(campo_nome,campo_cont,true,true,true));
        System.out.println("Field Name: " + campo_nome + "\nContent: " +
campo_cont);
        campo_nome = "";
        campo_cont = "";
    }
    scanner.close();
}  catch (FileNotFoundException e) {
    e.printStackTrace();
}



I tried to create a String field getting date, or a Date Field to search in
the lucene 1.4.3 lucene web demo, by two ways:

I) Using doc.add(Field.Keyword("modified",
DateField.timeToString(f.lastModified())));

When I try to print it, it returns "0etut1c5k".


II) Using Date Data = new Date ( f.lastModified() );
String arq_data = Data.toLocaleString();
arq_data = arq_data.replaceAll("/","");
arq_data = arq_data.substring(0,8);
doc.add(new Field("Data", arq_data, true, true, true));

When I try to print it, it returns "28102006" - the date.
But when I try to search it by the query "Data: 28102006", it doesn't return
the file.


So, I got the following questions:
I - What does the Field.KeyWord method returns?
II - Would somebody help me exaplining how to turn a date search / indexing
reality in my job?

Great hughes

Eder

Reply | Threaded
Open this post in threaded view
|

Re: Searching dates at Lucene 1.4.3 webindex

steve_rowe
Eder wrote:
> But when I try to search it by the query "Data: 28102006", it doesn't
> return the file.

Maybe try removing the space: "Data: 28102006" -> "Data:28102006"?
Reply | Threaded
Open this post in threaded view
|

Re: Searching dates at Lucene 1.4.3 webindex

Eder-3

Hi Steve

It didn't work! Anyway, when I try to search other fields including spaces,
it works! The problem it's with date maniputalion =[

Anyway, thanks a lot! We c!


----- Original Message -----
From: "Steven Rowe" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, October 31, 2006 1:43 PM
Subject: Re: Searching dates at Lucene 1.4.3 webindex


> Eder wrote:
>> But when I try to search it by the query "Data: 28102006", it doesn't
>> return the file.
>
> Maybe try removing the space: "Data: 28102006" -> "Data:28102006"?