[jira] [Created] (LUCENE-4293) ArabicRootsAnalyzer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (LUCENE-4293) ArabicRootsAnalyzer

JIRA jira@apache.org
Ibrahim created LUCENE-4293:
-------------------------------

             Summary: ArabicRootsAnalyzer
                 Key: LUCENE-4293
                 URL: https://issues.apache.org/jira/browse/LUCENE-4293
             Project: Lucene - Core
          Issue Type: New Feature
            Reporter: Ibrahim
            Priority: Minor


ArabicRootsAnalyzer is using an index of Arabic terms associated with its roots. each Arabic word has a root. There is no automatic way of deciding the root.

This Analyzer will match any term with its root, searching/indexing will be based on roots. It gives me great results in my application.

attached all the required files with the db. the problem with it is the size of the db (16MB). number of terms is around 300,000. I have another db with 600,000 but the attached one is summarized and better i believe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (LUCENE-4293) ArabicRootsAnalyzer

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ibrahim updated LUCENE-4293:
----------------------------

    Attachment: rootsTableIndex.zip
                ArabicTokens.txt
                ArabicTokenizer.java
                ArabicRootsAnalyzer.java
                ArabicRootFilter.java
   

> ArabicRootsAnalyzer
> -------------------
>
>                 Key: LUCENE-4293
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4293
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Ibrahim
>            Priority: Minor
>         Attachments: ArabicRootFilter.java, ArabicRootsAnalyzer.java, ArabicTokenizer.java, ArabicTokens.txt, rootsTableIndex.zip
>
>
> ArabicRootsAnalyzer is using an index of Arabic terms associated with its roots. each Arabic word has a root. There is no automatic way of deciding the root.
> This Analyzer will match any term with its root, searching/indexing will be based on roots. It gives me great results in my application.
> attached all the required files with the db. the problem with it is the size of the db (16MB). number of terms is around 300,000. I have another db with 600,000 but the attached one is summarized and better i believe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]