[jira] Created: (LUCENE-2510) migrate solr analysis factories to analyzers module

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-2510) migrate solr analysis factories to analyzers module

JIRA jira@apache.org
migrate solr analysis factories to analyzers module
---------------------------------------------------

                 Key: LUCENE-2510
                 URL: https://issues.apache.org/jira/browse/LUCENE-2510
             Project: Lucene - Java
          Issue Type: Task
          Components: contrib/analyzers
    Affects Versions: 4.0
            Reporter: Robert Muir
             Fix For: 4.0


In LUCENE-2413 all TokenStreams were consolidated into the analyzers module.

This is a good step, but I think the next step is to put the Solr factories into the analyzers module, too.

This would make analyzers artifacts plugins to both lucene and solr, with benefits such as:
* users could use the old analyzers module with solr, too. This is a good step to use real library versions instead of Version for backwards compat.
* analyzers modules such as smartcn and icu, that aren't currently available to solr users due to large file sizes or dependencies, would be simple optional plugins to solr and easily available to users that want them.

Rough sketch in this thread: http://www.lucidimagination.com/search/document/3465a0e55ba94d58/solr_and_analyzers_module

Practically, I havent looked much and don't really have a plan for how this will work yet, so ideas are very welcome.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2510) migrate solr analysis factories to analyzers module

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881685#action_12881685 ]

Chris Male commented on LUCENE-2510:
------------------------------------

+1 to this idea.

From my perspective the only issue is loading resources used in the factories, which in Solr is currently handled through the SolrResourceLoader.  My suggestion for this is to migrate Solr's ResourceLoader interface, and maybe provide simple FileResourceLoader and ClassPathResourceLoaders that Lucene users can use.

> migrate solr analysis factories to analyzers module
> ---------------------------------------------------
>
>                 Key: LUCENE-2510
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2510
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: contrib/analyzers
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>
> In LUCENE-2413 all TokenStreams were consolidated into the analyzers module.
> This is a good step, but I think the next step is to put the Solr factories into the analyzers module, too.
> This would make analyzers artifacts plugins to both lucene and solr, with benefits such as:
> * users could use the old analyzers module with solr, too. This is a good step to use real library versions instead of Version for backwards compat.
> * analyzers modules such as smartcn and icu, that aren't currently available to solr users due to large file sizes or dependencies, would be simple optional plugins to solr and easily available to users that want them.
> Rough sketch in this thread: http://www.lucidimagination.com/search/document/3465a0e55ba94d58/solr_and_analyzers_module
> Practically, I havent looked much and don't really have a plan for how this will work yet, so ideas are very welcome.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2510) migrate solr analysis factories to analyzers module

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881692#action_12881692 ]

Robert Muir commented on LUCENE-2510:
-------------------------------------

Chris, yes I think at a glance this is where I got stuck :)

Related to this, there is duplication in resource loading code already that would be nice to clean up.

For example, Lucene and Solr have their own separate stopword-loading code etc. But I don't really like some of the things Lucene's WordListLoader does:
* The lucene WordListLoader builds HashMaps and HashSets but this is wasteful since these are always then copied to CharArraySet/Maps... Solr's just builds CharArraySet/Map up front.
* the Solr file loading code has some features like trying to guess the size of the set/map up front for faster loading.
* the Solr stopword loading code is more user-friendly as it ignores BOM markers etc.

I think it would be good to only have one piece of code for this functionality and for it to be optimal.


> migrate solr analysis factories to analyzers module
> ---------------------------------------------------
>
>                 Key: LUCENE-2510
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2510
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: contrib/analyzers
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>
> In LUCENE-2413 all TokenStreams were consolidated into the analyzers module.
> This is a good step, but I think the next step is to put the Solr factories into the analyzers module, too.
> This would make analyzers artifacts plugins to both lucene and solr, with benefits such as:
> * users could use the old analyzers module with solr, too. This is a good step to use real library versions instead of Version for backwards compat.
> * analyzers modules such as smartcn and icu, that aren't currently available to solr users due to large file sizes or dependencies, would be simple optional plugins to solr and easily available to users that want them.
> Rough sketch in this thread: http://www.lucidimagination.com/search/document/3465a0e55ba94d58/solr_and_analyzers_module
> Practically, I havent looked much and don't really have a plan for how this will work yet, so ideas are very welcome.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2510) migrate solr analysis factories to analyzers module

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881694#action_12881694 ]

Chris Male commented on LUCENE-2510:
------------------------------------

I think if we consolidate how to load resources, then we can easily reduce the loading of word lists to a single optimal way.

> migrate solr analysis factories to analyzers module
> ---------------------------------------------------
>
>                 Key: LUCENE-2510
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2510
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: contrib/analyzers
>    Affects Versions: 4.0
>            Reporter: Robert Muir
>             Fix For: 4.0
>
>
> In LUCENE-2413 all TokenStreams were consolidated into the analyzers module.
> This is a good step, but I think the next step is to put the Solr factories into the analyzers module, too.
> This would make analyzers artifacts plugins to both lucene and solr, with benefits such as:
> * users could use the old analyzers module with solr, too. This is a good step to use real library versions instead of Version for backwards compat.
> * analyzers modules such as smartcn and icu, that aren't currently available to solr users due to large file sizes or dependencies, would be simple optional plugins to solr and easily available to users that want them.
> Rough sketch in this thread: http://www.lucidimagination.com/search/document/3465a0e55ba94d58/solr_and_analyzers_module
> Practically, I havent looked much and don't really have a plan for how this will work yet, so ideas are very welcome.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]