Chinese words highlighting

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Chinese words highlighting

Lee Li Bin
 

Hi,

 

Anyone knows how to highlight Chinese character? When I do the highlight, it
tends to highlight the whole sentence instead of the keywords.

For Chinese highlighting, do I need to use the TermVector in order to
highlight the correct keywords?

 

Thanks

 

Reply | Threaded
Open this post in threaded view
|

Re: Custom sorting - memory leaks

Xiaocheng Luan
Intentionally copied the subject line of this thread (from last August), and an email from the thread is attached at the end of this email -

I ran into similar problems in custom sorting (memory leak due to caching) - the subject has been well discussed in the thread but just want to add a voice and some philosophical thinking (as an attempt to push/support for a/the patch):

I think it makes sense to add the isCacheable() method (that Chris mentioned) to the SortComparatorSource (or maybe ScoreDocComparator?). Custom sort is by definition "custom" and may/should contain anything (like a reference to a large, dynamic data structure) appropriate for the application. Therefore, caching the custom sort object indiscriminately may not be the best behavior. Adding the isCacheable() method makes it a true (or "more true") "custom".

Even though there are workarounds like function query or customized hit collector, etc, the organic Lucene custom sort solution provides added benefits such as allowing secondary sort, relevancy scores, etc, without the need to re-invent the wheels.

The changes required to Lucene itself are minimal. Even though it will certainly create compatibility issues, but I think these issues are no more difficult to resolve than those created by dropping the deprecated classes.

I'd be more than happy to create a patch if the committers think there is a slight chance that it may get in.

Thanks,
Xiaocheng

========== Attachment ===========

Chris,

> I see what you're saying now ... yes, for cases like this it probably
> would be useful to a way to prevent the Comparator from being cached ...
That's what I'm talking about.
I agree this is very uncommon case.

> perhaps by adding a SortComparatorSource.isCachable() method ... but the
> changes you suggested would completely eliminate the ability for Lucene to
> cache custom comparators at all -- which owuld be just as bad for many
> people as the current behavior is for you.
It would be great if we have some isCachable method but i understand
all your points about compatibility.
I do not suggest to apply my patch - this is just "works for me" workaround.
Is it possible to add some kind of SortComparatorSource class / not
interface in new lucene releases with default behaviour to cache
ScoreDocComparators?

> On a related note: have you considered using FunctionQueries (in teh Solr
> code base) to do your distance calculations? ... it's been discussed on
> the java-users list a few times ... now that i understand that
> DistanceComparatorSource caches the computed distances for each requested
> "center" point, FunctionQuery certainly seems like a better way to go.
To say the truth i can't figure out how to switch from lucene sorting
capabilities to FunctionQueries. I have altered RequestHandler with
parsing SolrQueryRequest, building  complex Query, ChainedFilters,
Sort objects and i do not want to change all that stuff.

Thanks

Aleksey


 
---------------------------------
 Get your own web address.
 Have a HUGE year through Yahoo! Small Business.
Reply | Threaded
Open this post in threaded view
|

Re: Chinese words highlighting

Koji Sekiguchi
In reply to this post by Lee Li Bin
One possibility I can think of is that you are using CJKAnalyzer and
Lucene 2.0 or previous version.
The combination of those cannot highlight CJK keywords correctly.
If this is your case, try StandardAnalyzer or upgrading Lucene 2.1/2.2
and its CJKAnalyzer and highlighter.

Also check:
http://issues.apache.org/jira/browse/LUCENE-627

Hope this helps,

Koji

Lee Li Bin wrote:

>  
>
> Hi,
>
>  
>
> Anyone knows how to highlight Chinese character? When I do the highlight, it
> tends to highlight the whole sentence instead of the keywords.
>
> For Chinese highlighting, do I need to use the TermVector in order to
> highlight the correct keywords?
>
>  
>
> Thanks
>
>  
>
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]