Field should accept BytesRef?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Field should accept BytesRef?

Jason Rutherglen
In the Field object a text value must be of type string, however I
think we can allow a BytesRef to be passed in?

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Field should accept BytesRef?

Robert Muir
On Sun, May 15, 2011 at 12:05 PM, Jason Rutherglen
<[hidden email]> wrote:
> In the Field object a text value must be of type string, however I
> think we can allow a BytesRef to be passed in?
>

it would be nice if we sorted them in byte order too? I think right
now fields are sorted in utf-16 order, but terms are sorted in utf-8
order? (if so, this is confusing)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Field should accept BytesRef?

Uwe Schindler
Hi,

I think Jason meant the field value,  not the field name.

Field names should stay Strings, as they are only "identifiers" making them BytesRefs is not really useful.

But when you create an untokenized field (or even a binary field, which is stored-only at the moment), you could theoretically index the bytes directly.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Robert Muir [mailto:[hidden email]]
> Sent: Sunday, May 15, 2011 6:22 PM
> To: [hidden email]
> Subject: Re: Field should accept BytesRef?
>
> On Sun, May 15, 2011 at 12:05 PM, Jason Rutherglen
> <[hidden email]> wrote:
> > In the Field object a text value must be of type string, however I
> > think we can allow a BytesRef to be passed in?
> >
>
> it would be nice if we sorted them in byte order too? I think right now fields
> are sorted in utf-16 order, but terms are sorted in utf-8 order? (if so, this is
> confusing)
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email] For additional
> commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Field should accept BytesRef?

Jason Rutherglen
> But when you create an untokenized field (or even a binary field, which is stored-only at the moment), you could theoretically index the bytes directly

Right, if I already have a BytesRef of what needs to be indexed, then
passing the BR into Field/able should reduce garbage collection of
strings?

On Sun, May 15, 2011 at 9:59 AM, Uwe Schindler <[hidden email]> wrote:

> Hi,
>
> I think Jason meant the field value,  not the field name.
>
> Field names should stay Strings, as they are only "identifiers" making them BytesRefs is not really useful.
>
> But when you create an untokenized field (or even a binary field, which is stored-only at the moment), you could theoretically index the bytes directly.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
>> -----Original Message-----
>> From: Robert Muir [mailto:[hidden email]]
>> Sent: Sunday, May 15, 2011 6:22 PM
>> To: [hidden email]
>> Subject: Re: Field should accept BytesRef?
>>
>> On Sun, May 15, 2011 at 12:05 PM, Jason Rutherglen
>> <[hidden email]> wrote:
>> > In the Field object a text value must be of type string, however I
>> > think we can allow a BytesRef to be passed in?
>> >
>>
>> it would be nice if we sorted them in byte order too? I think right now fields
>> are sorted in utf-16 order, but terms are sorted in utf-8 order? (if so, this is
>> confusing)
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email] For additional
>> commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Field should accept BytesRef?

Robert Muir
On Mon, May 16, 2011 at 11:29 AM, Jason Rutherglen
<[hidden email]> wrote:
>> But when you create an untokenized field (or even a binary field, which is stored-only at the moment), you could theoretically index the bytes directly
>
> Right, if I already have a BytesRef of what needs to be indexed, then
> passing the BR into Field/able should reduce garbage collection of
> strings?
>

you can do this with a tokenstream, see
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/src/test/org/apache/lucene/index/Test2BTerms.java
for an example

(sorry i somehow was confused about your message earlier).

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]