Oracle Text 10g... or NOT

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Oracle Text 10g... or NOT

Rene Pineda
Hi -

I'm currently looking into adding full text search capabilities to our
site.  While some threads in this list had the same basic question (RDBMS
full-text versus lucene), their configurations and conderns were different.
Here's my configuration

* RDBMS is Enteprise Oracle 10g
* RAC-enabled RDMBS
* Dual fiber chanel RAID-5 configuration
* 2-node cluster
* 8GB RAM/per node
* Dual 3.6GHz Intel CPU/per node
* 99% of the content to be indexed is stored in our RDBMS
* Largest table size today 3 Billion (with a B) records
* Average table size 3 Million records

The question is, then, should I use Oracle 10g's full text capabilities or
lucene?

Since we have the oracle enteprise license, cost is not an issue (oracle
text comes with it).    I was able to create a demo using lucene in less
than 1/2 day, and we're looking towards creating the same demo using oracle
10g's full text search capabilities

Some ppl in this list migrated from RDBMS to lucene because of:
* speed - lucene is faster
* RDBMS server off load (someone reported they offloaded 70% of db server
work)
* cost (they didn't have the enteprise oracle license)
* index size - lucene indexes are smaller
* while some people had question with interMedia, I didn't find much
information with the newer Oracle 10g's full text search capabilties

Any thoughts?  Thanks in advance.
Reply | Threaded
Open this post in threaded view
|

RE: Oracle Text 10g... or NOT

Bryzek.Michael
We used Oracle interMedia/Text for search within the RDMS beginning with oracle 8i through oracle 10g. Two primary reasons we switched to solr/lucene:

  * We saw random errors (< .1% of the time) when users ran full text search. We believe the source of this error occurred during index update as users ran searches. Oracle support and our team never resolved this issue. We prefer to update our data set 2-4 times per hour and could never find a reliable way to do this with Oracle.

  * When we upgraded to Oracle 10g release 2, the frequency of these errors increased 10 fold and necessitated the change to another solution (Oracle support again could not diagnose root cause of our application errors). We first implemented Lucene, but then found Solr and have been extremely pleased. Solr offers the benefit of a standard XML HTTP API which allows us to expose search to all sorts of applications and partners with no additional effort.

We run oracle on redhat linux, so your mileage may vary. We also run standard edition one now, but oracle text was made part of this edition a few years ago.

In implementing, we've found a few other features that are quite nice:

  * If we change our indexing strategy (e.g. a new analyzer), we can stop the update process, index our data in a separate environment, transfer the new index datafiles to production, and restart the instance. You might be able to do full online rebuilds with Oracle Text, but with lucene it just a non issue.

  * Indexing is fast

  * Scaling search separate from RDBMS is a real blessing

-Mike


-----Original Message-----
From: Rene Pineda [mailto:[hidden email]]
Sent: Tue 10/17/06 12:02 PM
To: [hidden email]
Subject: Oracle Text 10g... or NOT
 
Hi -

I'm currently looking into adding full text search capabilities to our
site.  While some threads in this list had the same basic question (RDBMS
full-text versus lucene), their configurations and conderns were different.
Here's my configuration

* RDBMS is Enteprise Oracle 10g
* RAC-enabled RDMBS
* Dual fiber chanel RAID-5 configuration
* 2-node cluster
* 8GB RAM/per node
* Dual 3.6GHz Intel CPU/per node
* 99% of the content to be indexed is stored in our RDBMS
* Largest table size today 3 Billion (with a B) records
* Average table size 3 Million records

The question is, then, should I use Oracle 10g's full text capabilities or
lucene?

Since we have the oracle enteprise license, cost is not an issue (oracle
text comes with it).    I was able to create a demo using lucene in less
than 1/2 day, and we're looking towards creating the same demo using oracle
10g's full text search capabilities

Some ppl in this list migrated from RDBMS to lucene because of:
* speed - lucene is faster
* RDBMS server off load (someone reported they offloaded 70% of db server
work)
* cost (they didn't have the enteprise oracle license)
* index size - lucene indexes are smaller
* while some people had question with interMedia, I didn't find much
information with the newer Oracle 10g's full text search capabilties

Any thoughts?  Thanks in advance.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Oracle Text 10g... or NOT

chrislusf
Several additional reasons I can think of:
1) Being able to control the algorithsm, for example,
   1.1) applying your own analyzer to a field.
   1.2) control your own way of ranking
2) De-couple your data model from the searching
  Searching directly on your data model may not be ideal. You may want
to add more attributes, like "ranking", or de-normed info like tags
for the record.
3) Faster
  Faster is not just one advantage. It's a feature. Because it's fast,
you can add many new features based on that, like google's suggest, or
simply more different kinds of search at one shot.

Lucene Draw back:
1) Not easily to combine search results with the SQL conditions

--
Chris Lu
-------------------------
Instant Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com

On 10/17/06, Bryzek.Michael <[hidden email]> wrote:

> We used Oracle interMedia/Text for search within the RDMS beginning with oracle 8i through oracle 10g. Two primary reasons we switched to solr/lucene:
>
>   * We saw random errors (< .1% of the time) when users ran full text search. We believe the source of this error occurred during index update as users ran searches. Oracle support and our team never resolved this issue. We prefer to update our data set 2-4 times per hour and could never find a reliable way to do this with Oracle.
>
>   * When we upgraded to Oracle 10g release 2, the frequency of these errors increased 10 fold and necessitated the change to another solution (Oracle support again could not diagnose root cause of our application errors). We first implemented Lucene, but then found Solr and have been extremely pleased. Solr offers the benefit of a standard XML HTTP API which allows us to expose search to all sorts of applications and partners with no additional effort.
>
> We run oracle on redhat linux, so your mileage may vary. We also run standard edition one now, but oracle text was made part of this edition a few years ago.
>
> In implementing, we've found a few other features that are quite nice:
>
>   * If we change our indexing strategy (e.g. a new analyzer), we can stop the update process, index our data in a separate environment, transfer the new index datafiles to production, and restart the instance. You might be able to do full online rebuilds with Oracle Text, but with lucene it just a non issue.
>
>   * Indexing is fast
>
>   * Scaling search separate from RDBMS is a real blessing
>
> -Mike
>
>
> -----Original Message-----
> From: Rene Pineda [mailto:[hidden email]]
> Sent: Tue 10/17/06 12:02 PM
> To: [hidden email]
> Subject: Oracle Text 10g... or NOT
>
> Hi -
>
> I'm currently looking into adding full text search capabilities to our
> site.  While some threads in this list had the same basic question (RDBMS
> full-text versus lucene), their configurations and conderns were different.
> Here's my configuration
>
> * RDBMS is Enteprise Oracle 10g
> * RAC-enabled RDMBS
> * Dual fiber chanel RAID-5 configuration
> * 2-node cluster
> * 8GB RAM/per node
> * Dual 3.6GHz Intel CPU/per node
> * 99% of the content to be indexed is stored in our RDBMS
> * Largest table size today 3 Billion (with a B) records
> * Average table size 3 Million records
>
> The question is, then, should I use Oracle 10g's full text capabilities or
> lucene?
>
> Since we have the oracle enteprise license, cost is not an issue (oracle
> text comes with it).    I was able to create a demo using lucene in less
> than 1/2 day, and we're looking towards creating the same demo using oracle
> 10g's full text search capabilities
>
> Some ppl in this list migrated from RDBMS to lucene because of:
> * speed - lucene is faster
> * RDBMS server off load (someone reported they offloaded 70% of db server
> work)
> * cost (they didn't have the enteprise oracle license)
> * index size - lucene indexes are smaller
> * while some people had question with interMedia, I didn't find much
> information with the newer Oracle 10g's full text search capabilties
>
> Any thoughts?  Thanks in advance.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Oracle Text 10g... or NOT

Greg Colvin
Another option is to run Lucene inside your Oracle instance using  
it's JVM.  This might help with combining Lucene and Oracle search  
results.

On Oct 17, 2006, at 12:39 PM, Chris Lu wrote:

> Several additional reasons I can think of:
> 1) Being able to control the algorithsm, for example,
>   1.1) applying your own analyzer to a field.
>   1.2) control your own way of ranking
> 2) De-couple your data model from the searching
>  Searching directly on your data model may not be ideal. You may want
> to add more attributes, like "ranking", or de-normed info like tags
> for the record.
> 3) Faster
>  Faster is not just one advantage. It's a feature. Because it's fast,
> you can add many new features based on that, like google's suggest, or
> simply more different kinds of search at one shot.
>
> Lucene Draw back:
> 1) Not easily to combine search results with the SQL conditions
>
> --
> Chris Lu
> -------------------------
> Instant Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
>
> On 10/17/06, Bryzek.Michael <[hidden email]> wrote:
>> We used Oracle interMedia/Text for search within the RDMS  
>> beginning with oracle 8i through oracle 10g. Two primary reasons  
>> we switched to solr/lucene:
>>
>>   * We saw random errors (< .1% of the time) when users ran full  
>> text search. We believe the source of this error occurred during  
>> index update as users ran searches. Oracle support and our team  
>> never resolved this issue. We prefer to update our data set 2-4  
>> times per hour and could never find a reliable way to do this with  
>> Oracle.
>>
>>   * When we upgraded to Oracle 10g release 2, the frequency of  
>> these errors increased 10 fold and necessitated the change to  
>> another solution (Oracle support again could not diagnose root  
>> cause of our application errors). We first implemented Lucene, but  
>> then found Solr and have been extremely pleased. Solr offers the  
>> benefit of a standard XML HTTP API which allows us to expose  
>> search to all sorts of applications and partners with no  
>> additional effort.
>>
>> We run oracle on redhat linux, so your mileage may vary. We also  
>> run standard edition one now, but oracle text was made part of  
>> this edition a few years ago.
>>
>> In implementing, we've found a few other features that are quite  
>> nice:
>>
>>   * If we change our indexing strategy (e.g. a new analyzer), we  
>> can stop the update process, index our data in a separate  
>> environment, transfer the new index datafiles to production, and  
>> restart the instance. You might be able to do full online rebuilds  
>> with Oracle Text, but with lucene it just a non issue.
>>
>>   * Indexing is fast
>>
>>   * Scaling search separate from RDBMS is a real blessing
>>
>> -Mike
>>
>>
>> -----Original Message-----
>> From: Rene Pineda [mailto:[hidden email]]
>> Sent: Tue 10/17/06 12:02 PM
>> To: [hidden email]
>> Subject: Oracle Text 10g... or NOT
>>
>> Hi -
>>
>> I'm currently looking into adding full text search capabilities to  
>> our
>> site.  While some threads in this list had the same basic question  
>> (RDBMS
>> full-text versus lucene), their configurations and conderns were  
>> different.
>> Here's my configuration
>>
>> * RDBMS is Enteprise Oracle 10g
>> * RAC-enabled RDMBS
>> * Dual fiber chanel RAID-5 configuration
>> * 2-node cluster
>> * 8GB RAM/per node
>> * Dual 3.6GHz Intel CPU/per node
>> * 99% of the content to be indexed is stored in our RDBMS
>> * Largest table size today 3 Billion (with a B) records
>> * Average table size 3 Million records
>>
>> The question is, then, should I use Oracle 10g's full text  
>> capabilities or
>> lucene?
>>
>> Since we have the oracle enteprise license, cost is not an issue  
>> (oracle
>> text comes with it).    I was able to create a demo using lucene  
>> in less
>> than 1/2 day, and we're looking towards creating the same demo  
>> using oracle
>> 10g's full text search capabilities
>>
>> Some ppl in this list migrated from RDBMS to lucene because of:
>> * speed - lucene is faster
>> * RDBMS server off load (someone reported they offloaded 70% of db  
>> server
>> work)
>> * cost (they didn't have the enteprise oracle license)
>> * index size - lucene indexes are smaller
>> * while some people had question with interMedia, I didn't find much
>> information with the newer Oracle 10g's full text search capabilties
>>
>> Any thoughts?  Thanks in advance.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Oracle Text 10g... or NOT

Kuassi Mensah
Right, as described in my book,

> The Oracle database furnishes an embedded Java run time, which can be   >
used by database components such as XDB, *inter*Media, Spatial,
Text,          > XQuery, and so on. Oracle Text leverages the XML DB
framework, which     > includes a protocol server and a specialized Java
Servlet runner, all running > within the database.

Kuassi
- blog http://db360.blogspot.com/
- book http://db360.blogspot.com/2006/08/oracle-database-programming-using-java_01.html




On 10/17/06, Greg Colvin <[hidden email]> wrote:

>
> Another option is to run Lucene inside your Oracle instance using
> it's JVM.  This might help with combining Lucene and Oracle search
> results.
>
> On Oct 17, 2006, at 12:39 PM, Chris Lu wrote:
> > Several additional reasons I can think of:
> > 1) Being able to control the algorithsm, for example,
> >   1.1) applying your own analyzer to a field.
> >   1.2) control your own way of ranking
> > 2) De-couple your data model from the searching
> >  Searching directly on your data model may not be ideal. You may want
> > to add more attributes, like "ranking", or de-normed info like tags
> > for the record.
> > 3) Faster
> >  Faster is not just one advantage. It's a feature. Because it's fast,
> > you can add many new features based on that, like google's suggest, or
> > simply more different kinds of search at one shot.
> >
> > Lucene Draw back:
> > 1) Not easily to combine search results with the SQL conditions
> >
> > --
> > Chris Lu
> > -------------------------
> > Instant Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> >
> > On 10/17/06, Bryzek.Michael <[hidden email]> wrote:
> >> We used Oracle interMedia/Text for search within the RDMS
> >> beginning with oracle 8i through oracle 10g. Two primary reasons
> >> we switched to solr/lucene:
> >>
> >>   * We saw random errors (< .1% of the time) when users ran full
> >> text search. We believe the source of this error occurred during
> >> index update as users ran searches. Oracle support and our team
> >> never resolved this issue. We prefer to update our data set 2-4
> >> times per hour and could never find a reliable way to do this with
> >> Oracle.
> >>
> >>   * When we upgraded to Oracle 10g release 2, the frequency of
> >> these errors increased 10 fold and necessitated the change to
> >> another solution (Oracle support again could not diagnose root
> >> cause of our application errors). We first implemented Lucene, but
> >> then found Solr and have been extremely pleased. Solr offers the
> >> benefit of a standard XML HTTP API which allows us to expose
> >> search to all sorts of applications and partners with no
> >> additional effort.
> >>
> >> We run oracle on redhat linux, so your mileage may vary. We also
> >> run standard edition one now, but oracle text was made part of
> >> this edition a few years ago.
> >>
> >> In implementing, we've found a few other features that are quite
> >> nice:
> >>
> >>   * If we change our indexing strategy (e.g. a new analyzer), we
> >> can stop the update process, index our data in a separate
> >> environment, transfer the new index datafiles to production, and
> >> restart the instance. You might be able to do full online rebuilds
> >> with Oracle Text, but with lucene it just a non issue.
> >>
> >>   * Indexing is fast
> >>
> >>   * Scaling search separate from RDBMS is a real blessing
> >>
> >> -Mike
> >>
> >>
> >> -----Original Message-----
> >> From: Rene Pineda [mailto:[hidden email]]
> >> Sent: Tue 10/17/06 12:02 PM
> >> To: [hidden email]
> >> Subject: Oracle Text 10g... or NOT
> >>
> >> Hi -
> >>
> >> I'm currently looking into adding full text search capabilities to
> >> our
> >> site.  While some threads in this list had the same basic question
> >> (RDBMS
> >> full-text versus lucene), their configurations and conderns were
> >> different.
> >> Here's my configuration
> >>
> >> * RDBMS is Enteprise Oracle 10g
> >> * RAC-enabled RDMBS
> >> * Dual fiber chanel RAID-5 configuration
> >> * 2-node cluster
> >> * 8GB RAM/per node
> >> * Dual 3.6GHz Intel CPU/per node
> >> * 99% of the content to be indexed is stored in our RDBMS
> >> * Largest table size today 3 Billion (with a B) records
> >> * Average table size 3 Million records
> >>
> >> The question is, then, should I use Oracle 10g's full text
> >> capabilities or
> >> lucene?
> >>
> >> Since we have the oracle enteprise license, cost is not an issue
> >> (oracle
> >> text comes with it).    I was able to create a demo using lucene
> >> in less
> >> than 1/2 day, and we're looking towards creating the same demo
> >> using oracle
> >> 10g's full text search capabilities
> >>
> >> Some ppl in this list migrated from RDBMS to lucene because of:
> >> * speed - lucene is faster
> >> * RDBMS server off load (someone reported they offloaded 70% of db
> >> server
> >> work)
> >> * cost (they didn't have the enteprise oracle license)
> >> * index size - lucene indexes are smaller
> >> * while some people had question with interMedia, I didn't find much
> >> information with the newer Oracle 10g's full text search capabilties
> >>
> >> Any thoughts?  Thanks in advance.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
>