How to create Index ?

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

How to create Index ?

Arpit Sharma
I have put the .jar file in C:\lucene and I have also
unzip it and have also put all the directories(like
analysis,index,store) in C:\ lucene.

Now how to create a index ?
all the text files are in C:\text directory. I have
"lucene in action" book and with the help of it I made
a  Indexer.java program in C:\lucene and when I tried
to compile it it is giving lot's of errors.
The code is fine(it is copy paste from the book).

I am sure that there is some path problem. What should
I do ?

Thanks

Here is the code of the Indexer.java:-
----------------

/** * This code was originally written for
 **   Erik's Lucene intro java.net article */

public class Indexer {  
   
   public static void main(String[] args) throws
Exception {    
   
    if (args.length != 2) {
    throw new Exception("Usage: java " +
Indexer.class.getName()        
    + " <index dir> <data dir>");    
    }
       
    File indexDir = new File(args[0]);      
    File dataDir = new File(args[1]);      
   
    long start = new Date().getTime();    
    int numIndexed = index(indexDir, dataDir);    
    long end = new Date().getTime();    
   
    System.out.println("Indexing " + numIndexed + "
files took "      
    + (end - start) + " milliseconds");  
   
  }
 
  // open an index and start file directory traversal

 
  public static int index(File indexDir, File dataDir)
   
  throws IOException {
  if (!dataDir.exists() || !dataDir.isDirectory()) {
     
  throw new IOException(dataDir        
  + " does not exist or is not a directory");    
  }    
 
  IndexWriter writer = new IndexWriter(indexDir,    
   
  new StandardAnalyzer(), true);    
  writer.setUseCompoundFile(false);    
 
  indexDirectory(writer, dataDir);    
 
  int numIndexed = writer.docCount();    
 
  writer.optimize();    
  writer.close();        
 
  return numIndexed;  
  }
 
  // recursive method that calls itself when it finds
a directory  
 
  private static void indexDirectory(IndexWriter
writer, File dir)    
  throws IOException {    
 
  File[] files = dir.listFiles();    
  for (int i = 0; i < files.length; i++) {      
  File f = files[i];      
  if (f.isDirectory()) {        
  indexDirectory(writer, f);        
  } else if (f.getName().endsWith(".txt")) {        
 
   indexFile(writer, f);      
  }    
   }  
  }
 
  // method to actually index a file using Lucene  
 
  private static void indexFile(IndexWriter writer,
File f)    
    throws IOException {    
   
    if (f.isHidden() || !f.exists() || !f.canRead())
{      
    return;    
    }    
   
    System.out.println("Indexing " +
f.getCanonicalPath());    
   
    Document doc = new Document();    
    doc.add(Field.Text("contents", new
FileReader(f)));      
   
    doc.add(Field.Keyword("filename",
f.getCanonicalPath()));    
    writer.addDocument(doc);                    
    }
  }

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

Erik Hatcher
Arpit,

It looks like you've omitted the import statements from  
Indexer.java.  The book omits import statements to conserve space,  
but they are important.  The code is provided in its entirety at  
http://www.lucenebook.com

In fact, you could build an index by running the code directly (read  
the README file and follow the instructions first) by typing "ant  
Indexer" and following the prompts.  One of the prompts asks you  
where to put the index itself, and the next prompt asks for the  
directory of text files to index.

     Erik



On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote:

> I have put the .jar file in C:\lucene and I have also
> unzip it and have also put all the directories(like
> analysis,index,store) in C:\ lucene.
>
> Now how to create a index ?
> all the text files are in C:\text directory. I have
> "lucene in action" book and with the help of it I made
> a  Indexer.java program in C:\lucene and when I tried
> to compile it it is giving lot's of errors.
> The code is fine(it is copy paste from the book).
>
> I am sure that there is some path problem. What should
> I do ?
>
> Thanks
>
> Here is the code of the Indexer.java:-
> ----------------
>
> /** * This code was originally written for
>  **   Erik's Lucene intro java.net article */
>
> public class Indexer {
>
>    public static void main(String[] args) throws
> Exception {
>
>        if (args.length != 2) {
>            throw new Exception("Usage: java " +
> Indexer.class.getName()
>            + " <index dir> <data dir>");
>        }
>
>        File indexDir = new File(args[0]);
>        File dataDir = new File(args[1]);
>
>        long start = new Date().getTime();
>        int numIndexed = index(indexDir, dataDir);
>        long end = new Date().getTime();
>
>        System.out.println("Indexing " + numIndexed + "
> files took "
>        + (end - start) + " milliseconds");
>
>   }
>
>   // open an index and start file directory traversal
>
>
>   public static int index(File indexDir, File dataDir)
>
>       throws IOException {
>           if (!dataDir.exists() || !dataDir.isDirectory()) {
>
>               throw new IOException(dataDir
>               + " does not exist or is not a directory");
>           }
>
>           IndexWriter writer = new IndexWriter(indexDir,
>
>           new StandardAnalyzer(), true);
>           writer.setUseCompoundFile(false);
>
>           indexDirectory(writer, dataDir);
>
>           int numIndexed = writer.docCount();
>
>          writer.optimize();
>          writer.close();
>
>          return numIndexed;
>      }
>
>      // recursive method that calls itself when it finds
> a directory
>
>      private static void indexDirectory(IndexWriter
> writer, File dir)
>          throws IOException {
>
>          File[] files = dir.listFiles();
>          for (int i = 0; i < files.length; i++) {
>              File f = files[i];
>              if (f.isDirectory()) {
>                  indexDirectory(writer, f);
>              } else if (f.getName().endsWith(".txt")) {
>
>                indexFile(writer, f);
>              }
>            }
>      }
>
>      // method to actually index a file using Lucene
>
>      private static void indexFile(IndexWriter writer,
> File f)
>         throws IOException {
>
>         if (f.isHidden() || !f.exists() || !f.canRead())
> {
>                 return;
>         }
>
>         System.out.println("Indexing " +
> f.getCanonicalPath());
>
>         Document doc = new Document();
>         doc.add(Field.Text("contents", new
> FileReader(f)));
>
>         doc.add(Field.Keyword("filename",
> f.getCanonicalPath()));
>         writer.addDocument(doc);
>         }
>       }
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>

Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

Arpit Sharma
Thanks Erik but things still are not working. The
source code which I have downloaded does have README
file but it says "The JAR files in the lib directory
need to be in your build and execution classpath to
run manually." and I think I am not able to do that.
Can you please tell me step by step how to do this. I
am really sorry but I am very new to all this.

I have untar the lucene1.4.3.jar file and keep it's
folder is C:\org\apache\lucene than what shud I do ?
please also tell me how to add classpaths ?

Thanks alot

--- Erik Hatcher <[hidden email]> wrote:

> Arpit,
>
> It looks like you've omitted the import statements
> from  
> Indexer.java.  The book omits import statements to
> conserve space,  
> but they are important.  The code is provided in its
> entirety at  
> http://www.lucenebook.com
>
> In fact, you could build an index by running the
> code directly (read  
> the README file and follow the instructions first)
> by typing "ant  
> Indexer" and following the prompts.  One of the
> prompts asks you  
> where to put the index itself, and the next prompt
> asks for the  
> directory of text files to index.
>
>      Erik
>
>
>
> On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote:
>
> > I have put the .jar file in C:\lucene and I have
> also
> > unzip it and have also put all the
> directories(like
> > analysis,index,store) in C:\ lucene.
> >
> > Now how to create a index ?
> > all the text files are in C:\text directory. I
> have
> > "lucene in action" book and with the help of it I
> made
> > a  Indexer.java program in C:\lucene and when I
> tried
> > to compile it it is giving lot's of errors.
> > The code is fine(it is copy paste from the book).
> >
> > I am sure that there is some path problem. What
> should
> > I do ?
> >
> > Thanks
> >
> > Here is the code of the Indexer.java:-
> > ----------------
> >
> > /** * This code was originally written for
> >  **   Erik's Lucene intro java.net article */
> >
> > public class Indexer {
> >
> >    public static void main(String[] args) throws
> > Exception {
> >
> >        if (args.length != 2) {
> >            throw new Exception("Usage: java " +
> > Indexer.class.getName()
> >            + " <index dir> <data dir>");
> >        }
> >
> >        File indexDir = new File(args[0]);
> >        File dataDir = new File(args[1]);
> >
> >        long start = new Date().getTime();
> >        int numIndexed = index(indexDir, dataDir);
> >        long end = new Date().getTime();
> >
> >        System.out.println("Indexing " + numIndexed
> + "
> > files took "
> >        + (end - start) + " milliseconds");
> >
> >   }
> >
> >   // open an index and start file directory
> traversal
> >
> >
> >   public static int index(File indexDir, File
> dataDir)
> >
> >       throws IOException {
> >           if (!dataDir.exists() ||
> !dataDir.isDirectory()) {
> >
> >               throw new IOException(dataDir
> >               + " does not exist or is not a
> directory");
> >           }
> >
> >           IndexWriter writer = new
> IndexWriter(indexDir,
> >
> >           new StandardAnalyzer(), true);
> >           writer.setUseCompoundFile(false);
> >
> >           indexDirectory(writer, dataDir);
> >
> >           int numIndexed = writer.docCount();
> >
> >          writer.optimize();
> >          writer.close();
> >
> >          return numIndexed;
> >      }
> >
> >      // recursive method that calls itself when it
> finds
> > a directory
> >
> >      private static void
> indexDirectory(IndexWriter
> > writer, File dir)
> >          throws IOException {
> >
> >          File[] files = dir.listFiles();
> >          for (int i = 0; i < files.length; i++) {
> >              File f = files[i];
> >              if (f.isDirectory()) {
> >                  indexDirectory(writer, f);
> >              } else if
> (f.getName().endsWith(".txt")) {
> >
> >                indexFile(writer, f);
> >              }
> >            }
> >      }
> >
> >      // method to actually index a file using
> Lucene
> >
> >      private static void indexFile(IndexWriter
> writer,
> > File f)
> >         throws IOException {
> >
> >         if (f.isHidden() || !f.exists() ||
> !f.canRead())
> > {
> >                 return;
> >         }
> >
> >         System.out.println("Indexing " +
> > f.getCanonicalPath());
> >
> >         Document doc = new Document();
> >         doc.add(Field.Text("contents", new
> > FileReader(f)));
> >
> >         doc.add(Field.Keyword("filename",
> > f.getCanonicalPath()));
> >         writer.addDocument(doc);
> >         }
> >       }
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> >
>
>



               
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

Chris Hostetter-3

Understanding the basics of how to compile and execute java programs is a
little outside the scope of the lucene mailing list(s).  You should start
by looking at some tutorials for using java on windows.  In particular,
how to include jar files in your classpath (both when compiling and
running java applications).  A quick skim of google results for "java
classpath tutorial" turned up this result, which may be helpful...

  http://www.kevinboone.com/classpath.html



: Date: Tue, 20 Sep 2005 14:14:36 -0700 (PDT)
: From: Arpit Sharma <[hidden email]>
: Reply-To: [hidden email]
: To: [hidden email]
: Subject: Re: How to create Index ?
:
: Thanks Erik but things still are not working. The
: source code which I have downloaded does have README
: file but it says "The JAR files in the lib directory
: need to be in your build and execution classpath to
: run manually." and I think I am not able to do that.
: Can you please tell me step by step how to do this. I
: am really sorry but I am very new to all this.
:
: I have untar the lucene1.4.3.jar file and keep it's
: folder is C:\org\apache\lucene than what shud I do ?
: please also tell me how to add classpaths ?
:
: Thanks alot
:
: --- Erik Hatcher <[hidden email]> wrote:
:
: > Arpit,
: >
: > It looks like you've omitted the import statements
: > from
: > Indexer.java.  The book omits import statements to
: > conserve space,
: > but they are important.  The code is provided in its
: > entirety at
: > http://www.lucenebook.com
: >
: > In fact, you could build an index by running the
: > code directly (read
: > the README file and follow the instructions first)
: > by typing "ant
: > Indexer" and following the prompts.  One of the
: > prompts asks you
: > where to put the index itself, and the next prompt
: > asks for the
: > directory of text files to index.
: >
: >      Erik
: >
: >
: >
: > On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote:
: >
: > > I have put the .jar file in C:\lucene and I have
: > also
: > > unzip it and have also put all the
: > directories(like
: > > analysis,index,store) in C:\ lucene.
: > >
: > > Now how to create a index ?
: > > all the text files are in C:\text directory. I
: > have
: > > "lucene in action" book and with the help of it I
: > made
: > > a  Indexer.java program in C:\lucene and when I
: > tried
: > > to compile it it is giving lot's of errors.
: > > The code is fine(it is copy paste from the book).
: > >
: > > I am sure that there is some path problem. What
: > should
: > > I do ?
: > >
: > > Thanks
: > >
: > > Here is the code of the Indexer.java:-
: > > ----------------
: > >
: > > /** * This code was originally written for
: > >  **   Erik's Lucene intro java.net article */
: > >
: > > public class Indexer {
: > >
: > >    public static void main(String[] args) throws
: > > Exception {
: > >
: > >        if (args.length != 2) {
: > >            throw new Exception("Usage: java " +
: > > Indexer.class.getName()
: > >            + " <index dir> <data dir>");
: > >        }
: > >
: > >        File indexDir = new File(args[0]);
: > >        File dataDir = new File(args[1]);
: > >
: > >        long start = new Date().getTime();
: > >        int numIndexed = index(indexDir, dataDir);
: > >        long end = new Date().getTime();
: > >
: > >        System.out.println("Indexing " + numIndexed
: > + "
: > > files took "
: > >        + (end - start) + " milliseconds");
: > >
: > >   }
: > >
: > >   // open an index and start file directory
: > traversal
: > >
: > >
: > >   public static int index(File indexDir, File
: > dataDir)
: > >
: > >       throws IOException {
: > >           if (!dataDir.exists() ||
: > !dataDir.isDirectory()) {
: > >
: > >               throw new IOException(dataDir
: > >               + " does not exist or is not a
: > directory");
: > >           }
: > >
: > >           IndexWriter writer = new
: > IndexWriter(indexDir,
: > >
: > >           new StandardAnalyzer(), true);
: > >           writer.setUseCompoundFile(false);
: > >
: > >           indexDirectory(writer, dataDir);
: > >
: > >           int numIndexed = writer.docCount();
: > >
: > >          writer.optimize();
: > >          writer.close();
: > >
: > >          return numIndexed;
: > >      }
: > >
: > >      // recursive method that calls itself when it
: > finds
: > > a directory
: > >
: > >      private static void
: > indexDirectory(IndexWriter
: > > writer, File dir)
: > >          throws IOException {
: > >
: > >          File[] files = dir.listFiles();
: > >          for (int i = 0; i < files.length; i++) {
: > >              File f = files[i];
: > >              if (f.isDirectory()) {
: > >                  indexDirectory(writer, f);
: > >              } else if
: > (f.getName().endsWith(".txt")) {
: > >
: > >                indexFile(writer, f);
: > >              }
: > >            }
: > >      }
: > >
: > >      // method to actually index a file using
: > Lucene
: > >
: > >      private static void indexFile(IndexWriter
: > writer,
: > > File f)
: > >         throws IOException {
: > >
: > >         if (f.isHidden() || !f.exists() ||
: > !f.canRead())
: > > {
: > >                 return;
: > >         }
: > >
: > >         System.out.println("Indexing " +
: > > f.getCanonicalPath());
: > >
: > >         Document doc = new Document();
: > >         doc.add(Field.Text("contents", new
: > > FileReader(f)));
: > >
: > >         doc.add(Field.Keyword("filename",
: > > f.getCanonicalPath()));
: > >         writer.addDocument(doc);
: > >         }
: > >       }
: > >
: > > __________________________________________________
: > > Do You Yahoo!?
: > > Tired of spam?  Yahoo! Mail has the best spam
: > protection around
: > > http://mail.yahoo.com
: > >
: >
: >
:
:
:
:
: __________________________________
: Yahoo! Mail - PC Magazine Editors' Choice 2005
: http://mail.yahoo.com
:



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

Arpit Sharma
In reply to this post by Erik Hatcher
Hi erik and others,

Can you provide me the full code for Indexer program.
Will really appreciate it.

THanks alot.

--- Erik Hatcher <[hidden email]> wrote:

> Arpit,
>
> It looks like you've omitted the import statements
> from  
> Indexer.java.  The book omits import statements to
> conserve space,  
> but they are important.  The code is provided in its
> entirety at  
> http://www.lucenebook.com
>
> In fact, you could build an index by running the
> code directly (read  
> the README file and follow the instructions first)
> by typing "ant  
> Indexer" and following the prompts.  One of the
> prompts asks you  
> where to put the index itself, and the next prompt
> asks for the  
> directory of text files to index.
>
>      Erik
>
>
>
> On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote:
>
> > I have put the .jar file in C:\lucene and I have
> also
> > unzip it and have also put all the
> directories(like
> > analysis,index,store) in C:\ lucene.
> >
> > Now how to create a index ?
> > all the text files are in C:\text directory. I
> have
> > "lucene in action" book and with the help of it I
> made
> > a  Indexer.java program in C:\lucene and when I
> tried
> > to compile it it is giving lot's of errors.
> > The code is fine(it is copy paste from the book).
> >
> > I am sure that there is some path problem. What
> should
> > I do ?
> >
> > Thanks
> >
> > Here is the code of the Indexer.java:-
> > ----------------
> >
> > /** * This code was originally written for
> >  **   Erik's Lucene intro java.net article */
> >
> > public class Indexer {
> >
> >    public static void main(String[] args) throws
> > Exception {
> >
> >        if (args.length != 2) {
> >            throw new Exception("Usage: java " +
> > Indexer.class.getName()
> >            + " <index dir> <data dir>");
> >        }
> >
> >        File indexDir = new File(args[0]);
> >        File dataDir = new File(args[1]);
> >
> >        long start = new Date().getTime();
> >        int numIndexed = index(indexDir, dataDir);
> >        long end = new Date().getTime();
> >
> >        System.out.println("Indexing " + numIndexed
> + "
> > files took "
> >        + (end - start) + " milliseconds");
> >
> >   }
> >
> >   // open an index and start file directory
> traversal
> >
> >
> >   public static int index(File indexDir, File
> dataDir)
> >
> >       throws IOException {
> >           if (!dataDir.exists() ||
> !dataDir.isDirectory()) {
> >
> >               throw new IOException(dataDir
> >               + " does not exist or is not a
> directory");
> >           }
> >
> >           IndexWriter writer = new
> IndexWriter(indexDir,
> >
> >           new StandardAnalyzer(), true);
> >           writer.setUseCompoundFile(false);
> >
> >           indexDirectory(writer, dataDir);
> >
> >           int numIndexed = writer.docCount();
> >
> >          writer.optimize();
> >          writer.close();
> >
> >          return numIndexed;
> >      }
> >
> >      // recursive method that calls itself when it
> finds
> > a directory
> >
> >      private static void
> indexDirectory(IndexWriter
> > writer, File dir)
> >          throws IOException {
> >
> >          File[] files = dir.listFiles();
> >          for (int i = 0; i < files.length; i++) {
> >              File f = files[i];
> >              if (f.isDirectory()) {
> >                  indexDirectory(writer, f);
> >              } else if
> (f.getName().endsWith(".txt")) {
> >
> >                indexFile(writer, f);
> >              }
> >            }
> >      }
> >
> >      // method to actually index a file using
> Lucene
> >
> >      private static void indexFile(IndexWriter
> writer,
> > File f)
> >         throws IOException {
> >
> >         if (f.isHidden() || !f.exists() ||
> !f.canRead())
> > {
> >                 return;
> >         }
> >
> >         System.out.println("Indexing " +
> > f.getCanonicalPath());
> >
> >         Document doc = new Document();
> >         doc.add(Field.Text("contents", new
> > FileReader(f)));
> >
> >         doc.add(Field.Keyword("filename",
> > f.getCanonicalPath()));
> >         writer.addDocument(doc);
> >         }
> >       }
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> >
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

Erik Hatcher
Arpit - as was said below, the code is available from the Lucene in  
Action website (URL also below).

     Erik


On Sep 22, 2005, at 2:47 PM, Arpit Sharma wrote:

> Hi erik and others,
>
> Can you provide me the full code for Indexer program.
> Will really appreciate it.
>
> THanks alot.
>
> --- Erik Hatcher <[hidden email]> wrote:
>
>
>> Arpit,
>>
>> It looks like you've omitted the import statements
>> from
>> Indexer.java.  The book omits import statements to
>> conserve space,
>> but they are important.  The code is provided in its
>> entirety at
>> http://www.lucenebook.com
>>
>> In fact, you could build an index by running the
>> code directly (read
>> the README file and follow the instructions first)
>> by typing "ant
>> Indexer" and following the prompts.  One of the
>> prompts asks you
>> where to put the index itself, and the next prompt
>> asks for the
>> directory of text files to index.
>>
>>      Erik
>>
>>
>>
>> On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote:
>>
>>
>>> I have put the .jar file in C:\lucene and I have
>>>
>> also
>>
>>> unzip it and have also put all the
>>>
>> directories(like
>>
>>> analysis,index,store) in C:\ lucene.
>>>
>>> Now how to create a index ?
>>> all the text files are in C:\text directory. I
>>>
>> have
>>
>>> "lucene in action" book and with the help of it I
>>>
>> made
>>
>>> a  Indexer.java program in C:\lucene and when I
>>>
>> tried
>>
>>> to compile it it is giving lot's of errors.
>>> The code is fine(it is copy paste from the book).
>>>
>>> I am sure that there is some path problem. What
>>>
>> should
>>
>>> I do ?
>>>
>>> Thanks
>>>
>>> Here is the code of the Indexer.java:-
>>> ----------------
>>>
>>> /** * This code was originally written for
>>>  **   Erik's Lucene intro java.net article */
>>>
>>> public class Indexer {
>>>
>>>    public static void main(String[] args) throws
>>> Exception {
>>>
>>>        if (args.length != 2) {
>>>            throw new Exception("Usage: java " +
>>> Indexer.class.getName()
>>>            + " <index dir> <data dir>");
>>>        }
>>>
>>>        File indexDir = new File(args[0]);
>>>        File dataDir = new File(args[1]);
>>>
>>>        long start = new Date().getTime();
>>>        int numIndexed = index(indexDir, dataDir);
>>>        long end = new Date().getTime();
>>>
>>>        System.out.println("Indexing " + numIndexed
>>>
>> + "
>>
>>> files took "
>>>        + (end - start) + " milliseconds");
>>>
>>>   }
>>>
>>>   // open an index and start file directory
>>>
>> traversal
>>
>>>
>>>
>>>   public static int index(File indexDir, File
>>>
>> dataDir)
>>
>>>
>>>       throws IOException {
>>>           if (!dataDir.exists() ||
>>>
>> !dataDir.isDirectory()) {
>>
>>>
>>>               throw new IOException(dataDir
>>>               + " does not exist or is not a
>>>
>> directory");
>>
>>>           }
>>>
>>>           IndexWriter writer = new
>>>
>> IndexWriter(indexDir,
>>
>>>
>>>           new StandardAnalyzer(), true);
>>>           writer.setUseCompoundFile(false);
>>>
>>>           indexDirectory(writer, dataDir);
>>>
>>>           int numIndexed = writer.docCount();
>>>
>>>          writer.optimize();
>>>          writer.close();
>>>
>>>          return numIndexed;
>>>      }
>>>
>>>      // recursive method that calls itself when it
>>>
>> finds
>>
>>> a directory
>>>
>>>      private static void
>>>
>> indexDirectory(IndexWriter
>>
>>> writer, File dir)
>>>          throws IOException {
>>>
>>>          File[] files = dir.listFiles();
>>>          for (int i = 0; i < files.length; i++) {
>>>              File f = files[i];
>>>              if (f.isDirectory()) {
>>>                  indexDirectory(writer, f);
>>>              } else if
>>>
>> (f.getName().endsWith(".txt")) {
>>
>>>
>>>                indexFile(writer, f);
>>>              }
>>>            }
>>>      }
>>>
>>>      // method to actually index a file using
>>>
>> Lucene
>>
>>>
>>>      private static void indexFile(IndexWriter
>>>
>> writer,
>>
>>> File f)
>>>         throws IOException {
>>>
>>>         if (f.isHidden() || !f.exists() ||
>>>
>> !f.canRead())
>>
>>> {
>>>                 return;
>>>         }
>>>
>>>         System.out.println("Indexing " +
>>> f.getCanonicalPath());
>>>
>>>         Document doc = new Document();
>>>         doc.add(Field.Text("contents", new
>>> FileReader(f)));
>>>
>>>         doc.add(Field.Keyword("filename",
>>> f.getCanonicalPath()));
>>>         writer.addDocument(doc);
>>>         }
>>>       }
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam
>>>
>> protection around
>>
>>> http://mail.yahoo.com
>>>
>>>
>>
>>
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>

Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

Fernando Luiz Engelmann Junior
Does anyone have created the index and stored it on a database? I have
an application that uses jdbc, and i?m thinking if it?s possible to
store the indexes of lucene in this database. If someone of you guys
could help me, i appreciate....


Erik Hatcher wrote:

> Arpit - as was said below, the code is available from the Lucene in  
> Action website (URL also below).
>
>     Erik
>
>
> On Sep 22, 2005, at 2:47 PM, Arpit Sharma wrote:
>
>> Hi erik and others,
>>
>> Can you provide me the full code for Indexer program.
>> Will really appreciate it.
>>
>> THanks alot.
>>
>> --- Erik Hatcher <[hidden email]> wrote:
>>
>>
>>> Arpit,
>>>
>>> It looks like you've omitted the import statements
>>> from
>>> Indexer.java.  The book omits import statements to
>>> conserve space,
>>> but they are important.  The code is provided in its
>>> entirety at
>>> http://www.lucenebook.com
>>>
>>> In fact, you could build an index by running the
>>> code directly (read
>>> the README file and follow the instructions first)
>>> by typing "ant
>>> Indexer" and following the prompts.  One of the
>>> prompts asks you
>>> where to put the index itself, and the next prompt
>>> asks for the
>>> directory of text files to index.
>>>
>>>      Erik
>>>
>>>
>>>
>>> On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote:
>>>
>>>
>>>> I have put the .jar file in C:\lucene and I have
>>>>
>>> also
>>>
>>>> unzip it and have also put all the
>>>>
>>> directories(like
>>>
>>>> analysis,index,store) in C:\ lucene.
>>>>
>>>> Now how to create a index ?
>>>> all the text files are in C:\text directory. I
>>>>
>>> have
>>>
>>>> "lucene in action" book and with the help of it I
>>>>
>>> made
>>>
>>>> a  Indexer.java program in C:\lucene and when I
>>>>
>>> tried
>>>
>>>> to compile it it is giving lot's of errors.
>>>> The code is fine(it is copy paste from the book).
>>>>
>>>> I am sure that there is some path problem. What
>>>>
>>> should
>>>
>>>> I do ?
>>>>
>>>> Thanks
>>>>
>>>> Here is the code of the Indexer.java:-
>>>> ----------------
>>>>
>>>> /** * This code was originally written for
>>>>  **   Erik's Lucene intro java.net article */
>>>>
>>>> public class Indexer {
>>>>
>>>>    public static void main(String[] args) throws
>>>> Exception {
>>>>
>>>>        if (args.length != 2) {
>>>>            throw new Exception("Usage: java " +
>>>> Indexer.class.getName()
>>>>            + " <index dir> <data dir>");
>>>>        }
>>>>
>>>>        File indexDir = new File(args[0]);
>>>>        File dataDir = new File(args[1]);
>>>>
>>>>        long start = new Date().getTime();
>>>>        int numIndexed = index(indexDir, dataDir);
>>>>        long end = new Date().getTime();
>>>>
>>>>        System.out.println("Indexing " + numIndexed
>>>>
>>> + "
>>>
>>>> files took "
>>>>        + (end - start) + " milliseconds");
>>>>
>>>>   }
>>>>
>>>>   // open an index and start file directory
>>>>
>>> traversal
>>>
>>>>
>>>>
>>>>   public static int index(File indexDir, File
>>>>
>>> dataDir)
>>>
>>>>
>>>>       throws IOException {
>>>>           if (!dataDir.exists() ||
>>>>
>>> !dataDir.isDirectory()) {
>>>
>>>>
>>>>               throw new IOException(dataDir
>>>>               + " does not exist or is not a
>>>>
>>> directory");
>>>
>>>>           }
>>>>
>>>>           IndexWriter writer = new
>>>>
>>> IndexWriter(indexDir,
>>>
>>>>
>>>>           new StandardAnalyzer(), true);
>>>>           writer.setUseCompoundFile(false);
>>>>
>>>>           indexDirectory(writer, dataDir);
>>>>
>>>>           int numIndexed = writer.docCount();
>>>>
>>>>          writer.optimize();
>>>>          writer.close();
>>>>
>>>>          return numIndexed;
>>>>      }
>>>>
>>>>      // recursive method that calls itself when it
>>>>
>>> finds
>>>
>>>> a directory
>>>>
>>>>      private static void
>>>>
>>> indexDirectory(IndexWriter
>>>
>>>> writer, File dir)
>>>>          throws IOException {
>>>>
>>>>          File[] files = dir.listFiles();
>>>>          for (int i = 0; i < files.length; i++) {
>>>>              File f = files[i];
>>>>              if (f.isDirectory()) {
>>>>                  indexDirectory(writer, f);
>>>>              } else if
>>>>
>>> (f.getName().endsWith(".txt")) {
>>>
>>>>
>>>>                indexFile(writer, f);
>>>>              }
>>>>            }
>>>>      }
>>>>
>>>>      // method to actually index a file using
>>>>
>>> Lucene
>>>
>>>>
>>>>      private static void indexFile(IndexWriter
>>>>
>>> writer,
>>>
>>>> File f)
>>>>         throws IOException {
>>>>
>>>>         if (f.isHidden() || !f.exists() ||
>>>>
>>> !f.canRead())
>>>
>>>> {
>>>>                 return;
>>>>         }
>>>>
>>>>         System.out.println("Indexing " +
>>>> f.getCanonicalPath());
>>>>
>>>>         Document doc = new Document();
>>>>         doc.add(Field.Text("contents", new
>>>> FileReader(f)));
>>>>
>>>>         doc.add(Field.Keyword("filename",
>>>> f.getCanonicalPath()));
>>>>         writer.addDocument(doc);
>>>>         }
>>>>       }
>>>>
>>>> __________________________________________________
>>>> Do You Yahoo!?
>>>> Tired of spam?  Yahoo! Mail has the best spam
>>>>
>>> protection around
>>>
>>>> http://mail.yahoo.com
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>> __________________________________________________
>> Do You Yahoo!?
>> Tired of spam?  Yahoo! Mail has the best spam protection around
>> http://mail.yahoo.com
>>
>


smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

Christophe Pettus
Hi,

(First time poster!)

I considered that when working on my application, but I couldn't  
figure out a reason that it would be an advantage over plain flat  
files.  The only possible advantage I could see was distribution (you  
could update the index in one place and have all the dbms clients get  
copies), but I decided to solve that with an RMI solution (a la  
Lucene in Action's examples).  What kind of functionality were you  
looking to gain from storing the indexes in the dbms?

On 22 Sep 2005, at 12:33, Fernando Luiz Engelmann Junior wrote:

> Does anyone have created the index and stored it on a database? I  
> have an application that uses jdbc, and i´m thinking if it´s  
> possible to store the indexes of lucene in this database. If  
> someone of you guys could help me, i appreciate....
>
>
> Erik Hatcher wrote:
>
>
>> Arpit - as was said below, the code is available from the Lucene  
>> in  Action website (URL also below).
>>
>>     Erik
>>
>>
>> On Sep 22, 2005, at 2:47 PM, Arpit Sharma wrote:
>>
>>
>>> Hi erik and others,
>>>
>>> Can you provide me the full code for Indexer program.
>>> Will really appreciate it.
>>>
>>> THanks alot.
>>>
>>> --- Erik Hatcher <[hidden email]> wrote:
>>>
>>>
>>>
>>>> Arpit,
>>>>
>>>> It looks like you've omitted the import statements
>>>> from
>>>> Indexer.java.  The book omits import statements to
>>>> conserve space,
>>>> but they are important.  The code is provided in its
>>>> entirety at
>>>> http://www.lucenebook.com
>>>>
>>>> In fact, you could build an index by running the
>>>> code directly (read
>>>> the README file and follow the instructions first)
>>>> by typing "ant
>>>> Indexer" and following the prompts.  One of the
>>>> prompts asks you
>>>> where to put the index itself, and the next prompt
>>>> asks for the
>>>> directory of text files to index.
>>>>
>>>>      Erik
>>>>
>>>>
>>>>
>>>> On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote:
>>>>
>>>>
>>>>
>>>>> I have put the .jar file in C:\lucene and I have
>>>>>
>>>>>
>>>> also
>>>>
>>>>
>>>>> unzip it and have also put all the
>>>>>
>>>>>
>>>> directories(like
>>>>
>>>>
>>>>> analysis,index,store) in C:\ lucene.
>>>>>
>>>>> Now how to create a index ?
>>>>> all the text files are in C:\text directory. I
>>>>>
>>>>>
>>>> have
>>>>
>>>>
>>>>> "lucene in action" book and with the help of it I
>>>>>
>>>>>
>>>> made
>>>>
>>>>
>>>>> a  Indexer.java program in C:\lucene and when I
>>>>>
>>>>>
>>>> tried
>>>>
>>>>
>>>>> to compile it it is giving lot's of errors.
>>>>> The code is fine(it is copy paste from the book).
>>>>>
>>>>> I am sure that there is some path problem. What
>>>>>
>>>>>
>>>> should
>>>>
>>>>
>>>>> I do ?
>>>>>
>>>>> Thanks
>>>>>
>>>>> Here is the code of the Indexer.java:-
>>>>> ----------------
>>>>>
>>>>> /** * This code was originally written for
>>>>>  **   Erik's Lucene intro java.net article */
>>>>>
>>>>> public class Indexer {
>>>>>
>>>>>    public static void main(String[] args) throws
>>>>> Exception {
>>>>>
>>>>>        if (args.length != 2) {
>>>>>            throw new Exception("Usage: java " +
>>>>> Indexer.class.getName()
>>>>>            + " <index dir> <data dir>");
>>>>>        }
>>>>>
>>>>>        File indexDir = new File(args[0]);
>>>>>        File dataDir = new File(args[1]);
>>>>>
>>>>>        long start = new Date().getTime();
>>>>>        int numIndexed = index(indexDir, dataDir);
>>>>>        long end = new Date().getTime();
>>>>>
>>>>>        System.out.println("Indexing " + numIndexed
>>>>>
>>>>>
>>>> + "
>>>>
>>>>
>>>>> files took "
>>>>>        + (end - start) + " milliseconds");
>>>>>
>>>>>   }
>>>>>
>>>>>   // open an index and start file directory
>>>>>
>>>>>
>>>> traversal
>>>>
>>>>
>>>>>
>>>>>
>>>>>   public static int index(File indexDir, File
>>>>>
>>>>>
>>>> dataDir)
>>>>
>>>>
>>>>>
>>>>>       throws IOException {
>>>>>           if (!dataDir.exists() ||
>>>>>
>>>>>
>>>> !dataDir.isDirectory()) {
>>>>
>>>>
>>>>>
>>>>>               throw new IOException(dataDir
>>>>>               + " does not exist or is not a
>>>>>
>>>>>
>>>> directory");
>>>>
>>>>
>>>>>           }
>>>>>
>>>>>           IndexWriter writer = new
>>>>>
>>>>>
>>>> IndexWriter(indexDir,
>>>>
>>>>
>>>>>
>>>>>           new StandardAnalyzer(), true);
>>>>>           writer.setUseCompoundFile(false);
>>>>>
>>>>>           indexDirectory(writer, dataDir);
>>>>>
>>>>>           int numIndexed = writer.docCount();
>>>>>
>>>>>          writer.optimize();
>>>>>          writer.close();
>>>>>
>>>>>          return numIndexed;
>>>>>      }
>>>>>
>>>>>      // recursive method that calls itself when it
>>>>>
>>>>>
>>>> finds
>>>>
>>>>
>>>>> a directory
>>>>>
>>>>>      private static void
>>>>>
>>>>>
>>>> indexDirectory(IndexWriter
>>>>
>>>>
>>>>> writer, File dir)
>>>>>          throws IOException {
>>>>>
>>>>>          File[] files = dir.listFiles();
>>>>>          for (int i = 0; i < files.length; i++) {
>>>>>              File f = files[i];
>>>>>              if (f.isDirectory()) {
>>>>>                  indexDirectory(writer, f);
>>>>>              } else if
>>>>>
>>>>>
>>>> (f.getName().endsWith(".txt")) {
>>>>
>>>>
>>>>>
>>>>>                indexFile(writer, f);
>>>>>              }
>>>>>            }
>>>>>      }
>>>>>
>>>>>      // method to actually index a file using
>>>>>
>>>>>
>>>> Lucene
>>>>
>>>>
>>>>>
>>>>>      private static void indexFile(IndexWriter
>>>>>
>>>>>
>>>> writer,
>>>>
>>>>
>>>>> File f)
>>>>>         throws IOException {
>>>>>
>>>>>         if (f.isHidden() || !f.exists() ||
>>>>>
>>>>>
>>>> !f.canRead())
>>>>
>>>>
>>>>> {
>>>>>                 return;
>>>>>         }
>>>>>
>>>>>         System.out.println("Indexing " +
>>>>> f.getCanonicalPath());
>>>>>
>>>>>         Document doc = new Document();
>>>>>         doc.add(Field.Text("contents", new
>>>>> FileReader(f)));
>>>>>
>>>>>         doc.add(Field.Keyword("filename",
>>>>> f.getCanonicalPath()));
>>>>>         writer.addDocument(doc);
>>>>>         }
>>>>>       }
>>>>>
>>>>> __________________________________________________
>>>>> Do You Yahoo!?
>>>>> Tired of spam?  Yahoo! Mail has the best spam
>>>>>
>>>>>
>>>> protection around
>>>>
>>>>
>>>>> http://mail.yahoo.com
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>>
>>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

Fernando Luiz Engelmann Junior
I have a portal application, installed on the server. I want to store
the index in the dbms, cause all the data would be centralized in just
one place(Oracle or mysql for example). So when i need to do a backup,
or move my site to another server, the impact would be smaller then if i
have the index in one place(filesystem) and data on another(dbms).
Besides, my point of view is, if i could store all the information in
the dbms, i wont be have any headache with security roles or something
like that.



Christophe wrote:

> Hi,
>
> (First time poster!)
>
> I considered that when working on my application, but I couldn't  
> figure out a reason that it would be an advantage over plain flat  
> files.  The only possible advantage I could see was distribution (you  
> could update the index in one place and have all the dbms clients get  
> copies), but I decided to solve that with an RMI solution (a la  
> Lucene in Action's examples).  What kind of functionality were you  
> looking to gain from storing the indexes in the dbms?
>
> On 22 Sep 2005, at 12:33, Fernando Luiz Engelmann Junior wrote:
>
>> Does anyone have created the index and stored it on a database? I  
>> have an application that uses jdbc, and i?m thinking if it?s  
>> possible to store the indexes of lucene in this database. If  someone
>> of you guys could help me, i appreciate....
>>
>>
>> Erik Hatcher wrote:
>>
>>
>>> Arpit - as was said below, the code is available from the Lucene  
>>> in  Action website (URL also below).
>>>
>>>     Erik
>>>
>>>
>>> On Sep 22, 2005, at 2:47 PM, Arpit Sharma wrote:
>>>
>>>
>>>> Hi erik and others,
>>>>
>>>> Can you provide me the full code for Indexer program.
>>>> Will really appreciate it.
>>>>
>>>> THanks alot.
>>>>
>>>> --- Erik Hatcher <[hidden email]> wrote:
>>>>
>>>>
>>>>
>>>>> Arpit,
>>>>>
>>>>> It looks like you've omitted the import statements
>>>>> from
>>>>> Indexer.java.  The book omits import statements to
>>>>> conserve space,
>>>>> but they are important.  The code is provided in its
>>>>> entirety at
>>>>> http://www.lucenebook.com
>>>>>
>>>>> In fact, you could build an index by running the
>>>>> code directly (read
>>>>> the README file and follow the instructions first)
>>>>> by typing "ant
>>>>> Indexer" and following the prompts.  One of the
>>>>> prompts asks you
>>>>> where to put the index itself, and the next prompt
>>>>> asks for the
>>>>> directory of text files to index.
>>>>>
>>>>>      Erik
>>>>>
>>>>>
>>>>>
>>>>> On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote:
>>>>>
>>>>>
>>>>>
>>>>>> I have put the .jar file in C:\lucene and I have
>>>>>>
>>>>>>
>>>>> also
>>>>>
>>>>>
>>>>>> unzip it and have also put all the
>>>>>>
>>>>>>
>>>>> directories(like
>>>>>
>>>>>
>>>>>> analysis,index,store) in C:\ lucene.
>>>>>>
>>>>>> Now how to create a index ?
>>>>>> all the text files are in C:\text directory. I
>>>>>>
>>>>>>
>>>>> have
>>>>>
>>>>>
>>>>>> "lucene in action" book and with the help of it I
>>>>>>
>>>>>>
>>>>> made
>>>>>
>>>>>
>>>>>> a  Indexer.java program in C:\lucene and when I
>>>>>>
>>>>>>
>>>>> tried
>>>>>
>>>>>
>>>>>> to compile it it is giving lot's of errors.
>>>>>> The code is fine(it is copy paste from the book).
>>>>>>
>>>>>> I am sure that there is some path problem. What
>>>>>>
>>>>>>
>>>>> should
>>>>>
>>>>>
>>>>>> I do ?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>> Here is the code of the Indexer.java:-
>>>>>> ----------------
>>>>>>
>>>>>> /** * This code was originally written for
>>>>>>  **   Erik's Lucene intro java.net article */
>>>>>>
>>>>>> public class Indexer {
>>>>>>
>>>>>>    public static void main(String[] args) throws
>>>>>> Exception {
>>>>>>
>>>>>>        if (args.length != 2) {
>>>>>>            throw new Exception("Usage: java " +
>>>>>> Indexer.class.getName()
>>>>>>            + " <index dir> <data dir>");
>>>>>>        }
>>>>>>
>>>>>>        File indexDir = new File(args[0]);
>>>>>>        File dataDir = new File(args[1]);
>>>>>>
>>>>>>        long start = new Date().getTime();
>>>>>>        int numIndexed = index(indexDir, dataDir);
>>>>>>        long end = new Date().getTime();
>>>>>>
>>>>>>        System.out.println("Indexing " + numIndexed
>>>>>>
>>>>>>
>>>>> + "
>>>>>
>>>>>
>>>>>> files took "
>>>>>>        + (end - start) + " milliseconds");
>>>>>>
>>>>>>   }
>>>>>>
>>>>>>   // open an index and start file directory
>>>>>>
>>>>>>
>>>>> traversal
>>>>>
>>>>>
>>>>>>
>>>>>>
>>>>>>   public static int index(File indexDir, File
>>>>>>
>>>>>>
>>>>> dataDir)
>>>>>
>>>>>
>>>>>>
>>>>>>       throws IOException {
>>>>>>           if (!dataDir.exists() ||
>>>>>>
>>>>>>
>>>>> !dataDir.isDirectory()) {
>>>>>
>>>>>
>>>>>>
>>>>>>               throw new IOException(dataDir
>>>>>>               + " does not exist or is not a
>>>>>>
>>>>>>
>>>>> directory");
>>>>>
>>>>>
>>>>>>           }
>>>>>>
>>>>>>           IndexWriter writer = new
>>>>>>
>>>>>>
>>>>> IndexWriter(indexDir,
>>>>>
>>>>>
>>>>>>
>>>>>>           new StandardAnalyzer(), true);
>>>>>>           writer.setUseCompoundFile(false);
>>>>>>
>>>>>>           indexDirectory(writer, dataDir);
>>>>>>
>>>>>>           int numIndexed = writer.docCount();
>>>>>>
>>>>>>          writer.optimize();
>>>>>>          writer.close();
>>>>>>
>>>>>>          return numIndexed;
>>>>>>      }
>>>>>>
>>>>>>      // recursive method that calls itself when it
>>>>>>
>>>>>>
>>>>> finds
>>>>>
>>>>>
>>>>>> a directory
>>>>>>
>>>>>>      private static void
>>>>>>
>>>>>>
>>>>> indexDirectory(IndexWriter
>>>>>
>>>>>
>>>>>> writer, File dir)
>>>>>>          throws IOException {
>>>>>>
>>>>>>          File[] files = dir.listFiles();
>>>>>>          for (int i = 0; i < files.length; i++) {
>>>>>>              File f = files[i];
>>>>>>              if (f.isDirectory()) {
>>>>>>                  indexDirectory(writer, f);
>>>>>>              } else if
>>>>>>
>>>>>>
>>>>> (f.getName().endsWith(".txt")) {
>>>>>
>>>>>
>>>>>>
>>>>>>                indexFile(writer, f);
>>>>>>              }
>>>>>>            }
>>>>>>      }
>>>>>>
>>>>>>      // method to actually index a file using
>>>>>>
>>>>>>
>>>>> Lucene
>>>>>
>>>>>
>>>>>>
>>>>>>      private static void indexFile(IndexWriter
>>>>>>
>>>>>>
>>>>> writer,
>>>>>
>>>>>
>>>>>> File f)
>>>>>>         throws IOException {
>>>>>>
>>>>>>         if (f.isHidden() || !f.exists() ||
>>>>>>
>>>>>>
>>>>> !f.canRead())
>>>>>
>>>>>
>>>>>> {
>>>>>>                 return;
>>>>>>         }
>>>>>>
>>>>>>         System.out.println("Indexing " +
>>>>>> f.getCanonicalPath());
>>>>>>
>>>>>>         Document doc = new Document();
>>>>>>         doc.add(Field.Text("contents", new
>>>>>> FileReader(f)));
>>>>>>
>>>>>>         doc.add(Field.Keyword("filename",
>>>>>> f.getCanonicalPath()));
>>>>>>         writer.addDocument(doc);
>>>>>>         }
>>>>>>       }
>>>>>>
>>>>>> __________________________________________________
>>>>>> Do You Yahoo!?
>>>>>> Tired of spam?  Yahoo! Mail has the best spam
>>>>>>
>>>>>>
>>>>> protection around
>>>>>
>>>>>
>>>>>> http://mail.yahoo.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> __________________________________________________
>>>> Do You Yahoo!?
>>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>>> http://mail.yahoo.com
>>>>
>>>>
>>>
>>>
>>
>>
>
>


smime.p7s (3K) Download Attachment
MS
Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

MS
see http://www.dotnetfirebird.org/


 On 9/22/05, Fernando Luiz Engelmann Junior <[hidden email]> wrote:

>
> I have a portal application, installed on the server. I want to store
> the index in the dbms, cause all the data would be centralized in just
> one place(Oracle or mysql for example). So when i need to do a backup,
> or move my site to another server, the impact would be smaller then if i
> have the index in one place(filesystem) and data on another(dbms).
> Besides, my point of view is, if i could store all the information in
> the dbms, i wont be have any headache with security roles or something
> like that.
>
>
>
> Christophe wrote:
>
> > Hi,
> >
> > (First time poster!)
> >
> > I considered that when working on my application, but I couldn't
> > figure out a reason that it would be an advantage over plain flat
> > files. The only possible advantage I could see was distribution (you
> > could update the index in one place and have all the dbms clients get
> > copies), but I decided to solve that with an RMI solution (a la
> > Lucene in Action's examples). What kind of functionality were you
> > looking to gain from storing the indexes in the dbms?
> >
> > On 22 Sep 2005, at 12:33, Fernando Luiz Engelmann Junior wrote:
> >
> >> Does anyone have created the index and stored it on a database? I
> >> have an application that uses jdbc, and i´m thinking if it´s
> >> possible to store the indexes of lucene in this database. If someone
> >> of you guys could help me, i appreciate....
> >>
> >>
> >> Erik Hatcher wrote:
> >>
> >>
> >>> Arpit - as was said below, the code is available from the Lucene
> >>> in Action website (URL also below).
> >>>
> >>> Erik
> >>>
> >>>
> >>> On Sep 22, 2005, at 2:47 PM, Arpit Sharma wrote:
> >>>
> >>>
> >>>> Hi erik and others,
> >>>>
> >>>> Can you provide me the full code for Indexer program.
> >>>> Will really appreciate it.
> >>>>
> >>>> THanks alot.
> >>>>
> >>>> --- Erik Hatcher <[hidden email]> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> Arpit,
> >>>>>
> >>>>> It looks like you've omitted the import statements
> >>>>> from
> >>>>> Indexer.java. The book omits import statements to
> >>>>> conserve space,
> >>>>> but they are important. The code is provided in its
> >>>>> entirety at
> >>>>> http://www.lucenebook.com
> >>>>>
> >>>>> In fact, you could build an index by running the
> >>>>> code directly (read
> >>>>> the README file and follow the instructions first)
> >>>>> by typing "ant
> >>>>> Indexer" and following the prompts. One of the
> >>>>> prompts asks you
> >>>>> where to put the index itself, and the next prompt
> >>>>> asks for the
> >>>>> directory of text files to index.
> >>>>>
> >>>>> Erik
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>> I have put the .jar file in C:\lucene and I have
> >>>>>>
> >>>>>>
> >>>>> also
> >>>>>
> >>>>>
> >>>>>> unzip it and have also put all the
> >>>>>>
> >>>>>>
> >>>>> directories(like
> >>>>>
> >>>>>
> >>>>>> analysis,index,store) in C:\ lucene.
> >>>>>>
> >>>>>> Now how to create a index ?
> >>>>>> all the text files are in C:\text directory. I
> >>>>>>
> >>>>>>
> >>>>> have
> >>>>>
> >>>>>
> >>>>>> "lucene in action" book and with the help of it I
> >>>>>>
> >>>>>>
> >>>>> made
> >>>>>
> >>>>>
> >>>>>> a Indexer.java program in C:\lucene and when I
> >>>>>>
> >>>>>>
> >>>>> tried
> >>>>>
> >>>>>
> >>>>>> to compile it it is giving lot's of errors.
> >>>>>> The code is fine(it is copy paste from the book).
> >>>>>>
> >>>>>> I am sure that there is some path problem. What
> >>>>>>
> >>>>>>
> >>>>> should
> >>>>>
> >>>>>
> >>>>>> I do ?
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> Here is the code of the Indexer.java:-
> >>>>>> ----------------
> >>>>>>
> >>>>>> /** * This code was originally written for
> >>>>>> ** Erik's Lucene intro java.net <http://java.net> article */
> >>>>>>
> >>>>>> public class Indexer {
> >>>>>>
> >>>>>> public static void main(String[] args) throws
> >>>>>> Exception {
> >>>>>>
> >>>>>> if (args.length != 2) {
> >>>>>> throw new Exception("Usage: java " +
> >>>>>> Indexer.class.getName()
> >>>>>> + " <index dir> <data dir>");
> >>>>>> }
> >>>>>>
> >>>>>> File indexDir = new File(args[0]);
> >>>>>> File dataDir = new File(args[1]);
> >>>>>>
> >>>>>> long start = new Date().getTime();
> >>>>>> int numIndexed = index(indexDir, dataDir);
> >>>>>> long end = new Date().getTime();
> >>>>>>
> >>>>>> System.out.println("Indexing " + numIndexed
> >>>>>>
> >>>>>>
> >>>>> + "
> >>>>>
> >>>>>
> >>>>>> files took "
> >>>>>> + (end - start) + " milliseconds");
> >>>>>>
> >>>>>> }
> >>>>>>
> >>>>>> // open an index and start file directory
> >>>>>>
> >>>>>>
> >>>>> traversal
> >>>>>
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> public static int index(File indexDir, File
> >>>>>>
> >>>>>>
> >>>>> dataDir)
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> throws IOException {
> >>>>>> if (!dataDir.exists() ||
> >>>>>>
> >>>>>>
> >>>>> !dataDir.isDirectory()) {
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> throw new IOException(dataDir
> >>>>>> + " does not exist or is not a
> >>>>>>
> >>>>>>
> >>>>> directory");
> >>>>>
> >>>>>
> >>>>>> }
> >>>>>>
> >>>>>> IndexWriter writer = new
> >>>>>>
> >>>>>>
> >>>>> IndexWriter(indexDir,
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> new StandardAnalyzer(), true);
> >>>>>> writer.setUseCompoundFile(false);
> >>>>>>
> >>>>>> indexDirectory(writer, dataDir);
> >>>>>>
> >>>>>> int numIndexed = writer.docCount();
> >>>>>>
> >>>>>> writer.optimize();
> >>>>>> writer.close();
> >>>>>>
> >>>>>> return numIndexed;
> >>>>>> }
> >>>>>>
> >>>>>> // recursive method that calls itself when it
> >>>>>>
> >>>>>>
> >>>>> finds
> >>>>>
> >>>>>
> >>>>>> a directory
> >>>>>>
> >>>>>> private static void
> >>>>>>
> >>>>>>
> >>>>> indexDirectory(IndexWriter
> >>>>>
> >>>>>
> >>>>>> writer, File dir)
> >>>>>> throws IOException {
> >>>>>>
> >>>>>> File[] files = dir.listFiles();
> >>>>>> for (int i = 0; i < files.length; i++) {
> >>>>>> File f = files[i];
> >>>>>> if (f.isDirectory()) {
> >>>>>> indexDirectory(writer, f);
> >>>>>> } else if
> >>>>>>
> >>>>>>
> >>>>> (f.getName().endsWith(".txt")) {
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> indexFile(writer, f);
> >>>>>> }
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> // method to actually index a file using
> >>>>>>
> >>>>>>
> >>>>> Lucene
> >>>>>
> >>>>>
> >>>>>>
> >>>>>> private static void indexFile(IndexWriter
> >>>>>>
> >>>>>>
> >>>>> writer,
> >>>>>
> >>>>>
> >>>>>> File f)
> >>>>>> throws IOException {
> >>>>>>
> >>>>>> if (f.isHidden() || !f.exists() ||
> >>>>>>
> >>>>>>
> >>>>> !f.canRead())
> >>>>>
> >>>>>
> >>>>>> {
> >>>>>> return;
> >>>>>> }
> >>>>>>
> >>>>>> System.out.println("Indexing " +
> >>>>>> f.getCanonicalPath());
> >>>>>>
> >>>>>> Document doc = new Document();
> >>>>>> doc.add(Field.Text("contents", new
> >>>>>> FileReader(f)));
> >>>>>>
> >>>>>> doc.add(Field.Keyword("filename",
> >>>>>> f.getCanonicalPath()));
> >>>>>> writer.addDocument(doc);
> >>>>>> }
> >>>>>> }
> >>>>>>
> >>>>>> __________________________________________________
> >>>>>> Do You Yahoo!?
> >>>>>> Tired of spam? Yahoo! Mail has the best spam
> >>>>>>
> >>>>>>
> >>>>> protection around
> >>>>>
> >>>>>
> >>>>>> http://mail.yahoo.com
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> __________________________________________________
> >>>> Do You Yahoo!?
> >>>> Tired of spam? Yahoo! Mail has the best spam protection around
> >>>> http://mail.yahoo.com
> >>>>
> >>>>
> >>>
> >>>
> >>
> >>
> >
> >
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

Christophe Pettus
In reply to this post by Fernando Luiz Engelmann Junior
You could certainly subclass org.apache.lucene.store.Directory to  
create a new concrete class that stored the data in a dbms; in fact,  
the Javadoc for Directory specifically mentions that as a  
possibility.  You could, for example, map "files" in JDBCDirectory (a  
hypothetical class) to a table in the dbms with a text field for the  
name of the "file" and a BLOB as the content (you probably need a  
modification timestamp field, too).  I didn't end up doing it, but it  
does not look particularly difficult.

On 22 Sep 2005, at 13:54, Fernando Luiz Engelmann Junior wrote:

> I have a portal application, installed on the server. I want to  
> store the index in the dbms, cause all the data would be  
> centralized in just one place(Oracle or mysql for example). So when  
> i need to do a backup, or move my site to another server, the  
> impact would be smaller then if i have the index in one place
> (filesystem) and data on another(dbms). Besides, my point of view  
> is, if i could store all the information in the dbms, i wont be  
> have any headache with security roles or something like that.
>
>
>
> Christophe wrote:
>
>
>> Hi,
>>
>> (First time poster!)
>>
>> I considered that when working on my application, but I couldn't  
>> figure out a reason that it would be an advantage over plain flat  
>> files.  The only possible advantage I could see was distribution  
>> (you  could update the index in one place and have all the dbms  
>> clients get  copies), but I decided to solve that with an RMI  
>> solution (a la  Lucene in Action's examples).  What kind of  
>> functionality were you  looking to gain from storing the indexes  
>> in the dbms?
>>
>> On 22 Sep 2005, at 12:33, Fernando Luiz Engelmann Junior wrote:
>>
>>
>>> Does anyone have created the index and stored it on a database?  
>>> I  have an application that uses jdbc, and i´m thinking if it´s  
>>> possible to store the indexes of lucene in this database. If  
>>> someone of you guys could help me, i appreciate....
>>>
>>>
>>> Erik Hatcher wrote:
>>>
>>>
>>>
>>>> Arpit - as was said below, the code is available from the  
>>>> Lucene  in  Action website (URL also below).
>>>>
>>>>     Erik
>>>>
>>>>
>>>> On Sep 22, 2005, at 2:47 PM, Arpit Sharma wrote:
>>>>
>>>>
>>>>
>>>>> Hi erik and others,
>>>>>
>>>>> Can you provide me the full code for Indexer program.
>>>>> Will really appreciate it.
>>>>>
>>>>> THanks alot.
>>>>>
>>>>> --- Erik Hatcher <[hidden email]> wrote:
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Arpit,
>>>>>>
>>>>>> It looks like you've omitted the import statements
>>>>>> from
>>>>>> Indexer.java.  The book omits import statements to
>>>>>> conserve space,
>>>>>> but they are important.  The code is provided in its
>>>>>> entirety at
>>>>>> http://www.lucenebook.com
>>>>>>
>>>>>> In fact, you could build an index by running the
>>>>>> code directly (read
>>>>>> the README file and follow the instructions first)
>>>>>> by typing "ant
>>>>>> Indexer" and following the prompts.  One of the
>>>>>> prompts asks you
>>>>>> where to put the index itself, and the next prompt
>>>>>> asks for the
>>>>>> directory of text files to index.
>>>>>>
>>>>>>      Erik
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Sep 19, 2005, at 10:34 PM, Arpit Sharma wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I have put the .jar file in C:\lucene and I have
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> also
>>>>>>
>>>>>>
>>>>>>
>>>>>>> unzip it and have also put all the
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> directories(like
>>>>>>
>>>>>>
>>>>>>
>>>>>>> analysis,index,store) in C:\ lucene.
>>>>>>>
>>>>>>> Now how to create a index ?
>>>>>>> all the text files are in C:\text directory. I
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> have
>>>>>>
>>>>>>
>>>>>>
>>>>>>> "lucene in action" book and with the help of it I
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> made
>>>>>>
>>>>>>
>>>>>>
>>>>>>> a  Indexer.java program in C:\lucene and when I
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> tried
>>>>>>
>>>>>>
>>>>>>
>>>>>>> to compile it it is giving lot's of errors.
>>>>>>> The code is fine(it is copy paste from the book).
>>>>>>>
>>>>>>> I am sure that there is some path problem. What
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> should
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I do ?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Here is the code of the Indexer.java:-
>>>>>>> ----------------
>>>>>>>
>>>>>>> /** * This code was originally written for
>>>>>>>  **   Erik's Lucene intro java.net article */
>>>>>>>
>>>>>>> public class Indexer {
>>>>>>>
>>>>>>>    public static void main(String[] args) throws
>>>>>>> Exception {
>>>>>>>
>>>>>>>        if (args.length != 2) {
>>>>>>>            throw new Exception("Usage: java " +
>>>>>>> Indexer.class.getName()
>>>>>>>            + " <index dir> <data dir>");
>>>>>>>        }
>>>>>>>
>>>>>>>        File indexDir = new File(args[0]);
>>>>>>>        File dataDir = new File(args[1]);
>>>>>>>
>>>>>>>        long start = new Date().getTime();
>>>>>>>        int numIndexed = index(indexDir, dataDir);
>>>>>>>        long end = new Date().getTime();
>>>>>>>
>>>>>>>        System.out.println("Indexing " + numIndexed
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> + "
>>>>>>
>>>>>>
>>>>>>
>>>>>>> files took "
>>>>>>>        + (end - start) + " milliseconds");
>>>>>>>
>>>>>>>   }
>>>>>>>
>>>>>>>   // open an index and start file directory
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> traversal
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>   public static int index(File indexDir, File
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> dataDir)
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>       throws IOException {
>>>>>>>           if (!dataDir.exists() ||
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> !dataDir.isDirectory()) {
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>               throw new IOException(dataDir
>>>>>>>               + " does not exist or is not a
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> directory");
>>>>>>
>>>>>>
>>>>>>
>>>>>>>           }
>>>>>>>
>>>>>>>           IndexWriter writer = new
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> IndexWriter(indexDir,
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>           new StandardAnalyzer(), true);
>>>>>>>           writer.setUseCompoundFile(false);
>>>>>>>
>>>>>>>           indexDirectory(writer, dataDir);
>>>>>>>
>>>>>>>           int numIndexed = writer.docCount();
>>>>>>>
>>>>>>>          writer.optimize();
>>>>>>>          writer.close();
>>>>>>>
>>>>>>>          return numIndexed;
>>>>>>>      }
>>>>>>>
>>>>>>>      // recursive method that calls itself when it
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> finds
>>>>>>
>>>>>>
>>>>>>
>>>>>>> a directory
>>>>>>>
>>>>>>>      private static void
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> indexDirectory(IndexWriter
>>>>>>
>>>>>>
>>>>>>
>>>>>>> writer, File dir)
>>>>>>>          throws IOException {
>>>>>>>
>>>>>>>          File[] files = dir.listFiles();
>>>>>>>          for (int i = 0; i < files.length; i++) {
>>>>>>>              File f = files[i];
>>>>>>>              if (f.isDirectory()) {
>>>>>>>                  indexDirectory(writer, f);
>>>>>>>              } else if
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> (f.getName().endsWith(".txt")) {
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>                indexFile(writer, f);
>>>>>>>              }
>>>>>>>            }
>>>>>>>      }
>>>>>>>
>>>>>>>      // method to actually index a file using
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> Lucene
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>      private static void indexFile(IndexWriter
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> writer,
>>>>>>
>>>>>>
>>>>>>
>>>>>>> File f)
>>>>>>>         throws IOException {
>>>>>>>
>>>>>>>         if (f.isHidden() || !f.exists() ||
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> !f.canRead())
>>>>>>
>>>>>>
>>>>>>
>>>>>>> {
>>>>>>>                 return;
>>>>>>>         }
>>>>>>>
>>>>>>>         System.out.println("Indexing " +
>>>>>>> f.getCanonicalPath());
>>>>>>>
>>>>>>>         Document doc = new Document();
>>>>>>>         doc.add(Field.Text("contents", new
>>>>>>> FileReader(f)));
>>>>>>>
>>>>>>>         doc.add(Field.Keyword("filename",
>>>>>>> f.getCanonicalPath()));
>>>>>>>         writer.addDocument(doc);
>>>>>>>         }
>>>>>>>       }
>>>>>>>
>>>>>>> __________________________________________________
>>>>>>> Do You Yahoo!?
>>>>>>> Tired of spam?  Yahoo! Mail has the best spam
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> protection around
>>>>>>
>>>>>>
>>>>>>
>>>>>>> http://mail.yahoo.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> __________________________________________________
>>>>> Do You Yahoo!?
>>>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>>>> http://mail.yahoo.com
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: How to create Index ?

Daniel Naber
In reply to this post by MS
On Thursday 22 September 2005 22:59, Madhu Sasidhar, MD wrote:

> see http://www.dotnetfirebird.org/

BTW, it is not necessary to quote 150 lines just to add one line.

Regards
 Daniel

--
http://www.danielnaber.de