Problem of indexing pdf files

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem of indexing pdf files

tirupathi reddy
Hello,
 
    I am getting the following warning message when I am indexing the pdf files using Lucene Indexing.
 
 log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParser).
 log4j:WARN Please initialize the log4j system properly.
 
This is the code I am using:
 
     if(pdf.exists())
     {
      String text = "";
  try{
  PDDocument document = PDDocument.load(pdf); // laden des Files  
 
  PDFTextStripper pts = new PDFTextStripper(); //Extrahieren des Textes
  text = pts.getText(document);  
  document.close();
  }
 catch(IOException e){
 System.out.println("File not found");
 }
mDocument.add(Field.Text("fulltext", text));
 
 
    thanx,
 MTREDDY
 
 


Tirupati Reddy Manyam
24-06-08,
Sundugaullee-24,
79110 Freiburg
GERMANY.

Phone: 00497618811257
cell : 004917624649007

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: Problem of indexing pdf files

mano dasanayaka
Hi ,
 
If you are using lucene to index pdf files actually it won't work .But  ther's an on going project within Sourceforge  with relate to  content search called "docSearcher"  .docSearcher supports indexing pdf, and allother MS format files except ppt files..So i think you better to have a look into it, and the most important thing is that docSearcher is built using lucene ..
 
And the warnings that you have mentioned...are common..you have to append a looger for logings..and initialize the property file for log4j..
 
 
 
Best Regards,
Mano
 
tirupathi reddy <[hidden email]> wrote:
Hello,

I am getting the following warning message when I am indexing the pdf files using Lucene Indexing.

log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParser).
log4j:WARN Please initialize the log4j system properly.

This is the code I am using:

if(pdf.exists())
{
String text = "";
try{
PDDocument document = PDDocument.load(pdf); // laden des Files

PDFTextStripper pts = new PDFTextStripper(); //Extrahieren des Textes
text = pts.getText(document);
document.close();
}
catch(IOException e){
System.out.println("File not found");
}
mDocument.add(Field.Text("fulltext", text));


thanx,
MTREDDY




Tirupati Reddy Manyam
24-06-08,
Sundugaullee-24,
79110 Freiburg
GERMANY.

Phone: 00497618811257
cell : 004917624649007

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 

mmcd
__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

Re: Problem of indexing pdf files

Otis Gospodnetic-2
In reply to this post by tirupathi reddy
That's a log4j warning message, because one of the PDFBox classes is
trying to log something, and you don't have log4j configured
appropriately.  This is not a Lucene issue, and it's a warning, so you
can ignore it if you want.

Otis


--- tirupathi reddy <[hidden email]> wrote:

> Hello,
>  
>     I am getting the following warning message when I am indexing the
> pdf files using Lucene Indexing.
>  
>  log4j:WARN No appenders could be found for logger
> (org.pdfbox.pdfparser.PDFParser).
>  log4j:WARN Please initialize the log4j system properly.
>  
> This is the code I am using:
>  
>      if(pdf.exists())
>      {
>       String text = "";
>   try{
>   PDDocument document = PDDocument.load(pdf); // laden des Files  
>  
>   PDFTextStripper pts = new PDFTextStripper(); //Extrahieren des
> Textes
>   text = pts.getText(document);  
>   document.close();
>   }
>  catch(IOException e){
>  System.out.println("File not found");
>  }
> mDocument.add(Field.Text("fulltext", text));
>  
>  
>     thanx,
>  MTREDDY
>  
>  
>
>
> Tirupati Reddy Manyam
> 24-06-08,
> Sundugaullee-24,
> 79110 Freiburg
> GERMANY.
>
> Phone: 00497618811257
> cell : 004917624649007
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com