Which Java objects to index a web page ?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Which Java objects to index a web page ?

Fabrice Estiévenart-2
Hello,

How can I use Nutch Java objects to index one (or a very limited set of)
web page(s) without crawling them ?

Do I need to use the crawling tools (such as Injector, Generator, ...)
or can I do it by the means of lower-level objects (Content,
ParseResult, ...) ?

Thanks for your help,

Fabrice
Reply | Threaded
Open this post in threaded view
|

Re: Which Java objects to index a web page ?

Alexander Aristov
Nutch primarily is a crawler. I would suggest you to take a look at solr
which is just indexer and searcher. You may use it's API as well as open
interfaces

Best Regards
Alexander Aristov


2009/8/12 Fabrice Estiévenart <[hidden email]>

> Hello,
>
> How can I use Nutch Java objects to index one (or a very limited set of)
> web page(s) without crawling them ?
>
> Do I need to use the crawling tools (such as Injector, Generator, ...) or
> can I do it by the means of lower-level objects (Content, ParseResult, ...)
> ?
>
> Thanks for your help,
>
> Fabrice
>
Reply | Threaded
Open this post in threaded view
|

Re: Which Java objects to index a web page ?

Fabrice Estiévenart-2
I like using Nutch for the crawlDB, scalability, threading, document
parsing, ... but crawling is not important to me as I index targeted
data sources.

Obviously, I'm using it with Solr for indexing and searching documents.

Fabrice

Alexander Aristov a écrit :

> Nutch primarily is a crawler. I would suggest you to take a look at solr
> which is just indexer and searcher. You may use it's API as well as open
> interfaces
>
> Best Regards
> Alexander Aristov
>
>
> 2009/8/12 Fabrice Estiévenart <[hidden email]>
>
>  
>> Hello,
>>
>> How can I use Nutch Java objects to index one (or a very limited set of)
>> web page(s) without crawling them ?
>>
>> Do I need to use the crawling tools (such as Injector, Generator, ...) or
>> can I do it by the means of lower-level objects (Content, ParseResult, ...)
>> ?
>>
>> Thanks for your help,
>>
>> Fabrice
>>
>>    
>
>  


--
Fabrice Estiévenart, Ingénieur R&D, CETIC
Tél : +32 (0)71/49.07.28
Web : http://www.cetic.be