Solr Analyzer for Vietnamese

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Solr Analyzer for Vietnamese

Eirik Hungnes-2
Hi,

There doesn't seem to be any Tokenizer / Analyzer for Vietnamese built in
to Lucene at the moment. Does anyone know if something like this exists
today or is planned for? We found this
https://github.com/CaoManhDat/VNAnalyzer made by Cao Mahn Dat, but not sure
if it's up to date. Any info highly appreciated!

Thanks,

Eirik
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Solr Analyzer for Vietnamese

Erick Erickson
Eirik:

That code is 4 years old and for Lucene 4. I doubt it applies cleanly
to the current code base, but feel free to give it a try but it's not
guaranteed.

I know of no other Vietnamese analyzers available.

Dat is active in the community, don't know whether he has plans to
update/commit that bit of code.

Best,
Erick

On Mon, May 22, 2017 at 12:25 AM, Eirik Hungnes
<[hidden email]> wrote:

> Hi,
>
> There doesn't seem to be any Tokenizer / Analyzer for Vietnamese built in
> to Lucene at the moment. Does anyone know if something like this exists
> today or is planned for? We found this
> https://github.com/CaoManhDat/VNAnalyzer made by Cao Mahn Dat, but not sure
> if it's up to date. Any info highly appreciated!
>
> Thanks,
>
> Eirik
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Solr Analyzer for Vietnamese

Eirik Hungnes
Thanks Erick,

Dat:

Do you have more info about the subject?

2017-05-22 17:08 GMT+02:00 Erick Erickson <[hidden email]>:

> Eirik:
>
> That code is 4 years old and for Lucene 4. I doubt it applies cleanly
> to the current code base, but feel free to give it a try but it's not
> guaranteed.
>
> I know of no other Vietnamese analyzers available.
>
> Dat is active in the community, don't know whether he has plans to
> update/commit that bit of code.
>
> Best,
> Erick
>
> On Mon, May 22, 2017 at 12:25 AM, Eirik Hungnes
> <[hidden email]> wrote:
> > Hi,
> >
> > There doesn't seem to be any Tokenizer / Analyzer for Vietnamese built in
> > to Lucene at the moment. Does anyone know if something like this exists
> > today or is planned for? We found this
> > https://github.com/CaoManhDat/VNAnalyzer made by Cao Mahn Dat, but not
> sure
> > if it's up to date. Any info highly appreciated!
> >
> > Thanks,
> >
> > Eirik
>



--
Best regards,

Eirik Hungnes
CTO
Rubrikk Group AS

Cell: +4797027732
skypeid: blindkorn44
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Solr Analyzer for Vietnamese

Jan Høydahl / Cominvent
Cao, did you see this email from Eirik?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 1. jun. 2017 kl. 13.33 skrev Eirik Hungnes <[hidden email]>:
>
> Thanks Erick,
>
> Dat:
>
> Do you have more info about the subject?
>
> 2017-05-22 17:08 GMT+02:00 Erick Erickson <[hidden email]>:
>
>> Eirik:
>>
>> That code is 4 years old and for Lucene 4. I doubt it applies cleanly
>> to the current code base, but feel free to give it a try but it's not
>> guaranteed.
>>
>> I know of no other Vietnamese analyzers available.
>>
>> Dat is active in the community, don't know whether he has plans to
>> update/commit that bit of code.
>>
>> Best,
>> Erick
>>
>> On Mon, May 22, 2017 at 12:25 AM, Eirik Hungnes
>> <[hidden email]> wrote:
>>> Hi,
>>>
>>> There doesn't seem to be any Tokenizer / Analyzer for Vietnamese built in
>>> to Lucene at the moment. Does anyone know if something like this exists
>>> today or is planned for? We found this
>>> https://github.com/CaoManhDat/VNAnalyzer made by Cao Mahn Dat, but not
>> sure
>>> if it's up to date. Any info highly appreciated!
>>>
>>> Thanks,
>>>
>>> Eirik
>>
>
>
>
> --
> Best regards,
>
> Eirik Hungnes
> CTO
> Rubrikk Group AS
>
> Cell: +4797027732
> skypeid: blindkorn44

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Solr Analyzer for Vietnamese

Ahmet Arslan
In reply to this post by Eirik Hungnes-2

Hi Eirik,
I believe "icu tokenizer" does a decent job on text written in non-alphabets.
Ahmet

On Monday, May 22, 2017, 10:32:22 AM GMT+3, Eirik Hungnes <[hidden email]> wrote:


Hi,

There doesn't seem to be any Tokenizer / Analyzer for Vietnamese built in
to Lucene at the moment. Does anyone know if something like this exists
today or is planned for? We found this
https://github.com/CaoManhDat/VNAnalyzer made by Cao Mahn Dat, but not sure
if it's up to date. Any info highly appreciated!

Thanks,

Eirik
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Solr Analyzer for Vietnamese

caomanhdat
In reply to this post by Jan Høydahl / Cominvent
Sorry Erick, I had a draft but forgot to press send button.

I took a look at the Analyzer and It seems take some time to update it, But
I'm going to spend my weekends on update the repo.

Thanks!

On Thu, Jul 13, 2017 at 7:00 PM Jan Høydahl <[hidden email]> wrote:

> Cao, did you see this email from Eirik?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 1. jun. 2017 kl. 13.33 skrev Eirik Hungnes <[hidden email]>:
> >
> > Thanks Erick,
> >
> > Dat:
> >
> > Do you have more info about the subject?
> >
> > 2017-05-22 17:08 GMT+02:00 Erick Erickson <[hidden email]>:
> >
> >> Eirik:
> >>
> >> That code is 4 years old and for Lucene 4. I doubt it applies cleanly
> >> to the current code base, but feel free to give it a try but it's not
> >> guaranteed.
> >>
> >> I know of no other Vietnamese analyzers available.
> >>
> >> Dat is active in the community, don't know whether he has plans to
> >> update/commit that bit of code.
> >>
> >> Best,
> >> Erick
> >>
> >> On Mon, May 22, 2017 at 12:25 AM, Eirik Hungnes
> >> <[hidden email]> wrote:
> >>> Hi,
> >>>
> >>> There doesn't seem to be any Tokenizer / Analyzer for Vietnamese built
> in
> >>> to Lucene at the moment. Does anyone know if something like this exists
> >>> today or is planned for? We found this
> >>> https://github.com/CaoManhDat/VNAnalyzer made by Cao Mahn Dat, but not
> >> sure
> >>> if it's up to date. Any info highly appreciated!
> >>>
> >>> Thanks,
> >>>
> >>> Eirik
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> > Eirik Hungnes
> > CTO
> > Rubrikk Group AS
> >
> > Cell: +4797027732 <+47%20970%2027%20732>
> > skypeid: blindkorn44
>
>
Loading...