MapFile.get() has a bug?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

MapFile.get() has a bug?

Feng Jiang-2
Hi all,

For example, I have a MapFile, which is like:

K -> V
1 -> 1
1 -> 2
1 -> 3
2 -> 1
2 -> 2
2 -> 3
3 -> 1
3 -> 2
3 -> 3

when i call mapFile.get(2, value), the value will be filled as 2, not 1.

Is is a bug of MapFile? I think the reader should be positioned at the first
entry of the named key. am I right?

Thanks and best regards,

Feng Jiang
Reply | Threaded
Open this post in threaded view
|

Re: MapFile.get() has a bug?

Stefan Groschupf
Hi,

Aren't keys in a map file unique? I'm surprised that you able to  
write such a file.

Stefan

On 27.11.2006, at 22:15, Feng Jiang wrote:

> Hi all,
>
> For example, I have a MapFile, which is like:
>
> K -> V
> 1 -> 1
> 1 -> 2
> 1 -> 3
> 2 -> 1
> 2 -> 2
> 2 -> 3
> 3 -> 1
> 3 -> 2
> 3 -> 3
>
> when i call mapFile.get(2, value), the value will be filled as 2,  
> not 1.
>
> Is is a bug of MapFile? I think the reader should be positioned at  
> the first
> entry of the named key. am I right?
>
> Thanks and best regards,
>
> Feng Jiang

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
search tech for web 2.1
Menlo Park, California
http://www.101tec.com



Reply | Threaded
Open this post in threaded view
|

Re: MapFile.get() has a bug?

Feng Jiang-2
No. MapFile ONLY requires that the KEY is in increasing order. identity key
is allowed.

On 11/28/06, Stefan Groschupf <[hidden email]> wrote:

>
> Hi,
>
> Aren't keys in a map file unique? I'm surprised that you able to
> write such a file.
>
> Stefan
>
> On 27.11.2006, at 22:15, Feng Jiang wrote:
>
> > Hi all,
> >
> > For example, I have a MapFile, which is like:
> >
> > K -> V
> > 1 -> 1
> > 1 -> 2
> > 1 -> 3
> > 2 -> 1
> > 2 -> 2
> > 2 -> 3
> > 3 -> 1
> > 3 -> 2
> > 3 -> 3
> >
> > when i call mapFile.get(2, value), the value will be filled as 2,
> > not 1.
> >
> > Is is a bug of MapFile? I think the reader should be positioned at
> > the first
> > entry of the named key. am I right?
> >
> > Thanks and best regards,
> >
> > Feng Jiang
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 101tec Inc.
> search tech for web 2.1
> Menlo Park, California
> http://www.101tec.com
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: MapFile.get() has a bug?

Feng Jiang-2
In reply to this post by Stefan Groschupf
In the MapFile.Writer.checkKey() method, identical key is ok, unless you
append a new key which is "less" than the last key.

I did have such a file. but i am concerning that why the reader is
positioned at the first entry of that named key?

best wishes,

Feng

On 11/28/06, Stefan Groschupf <[hidden email]> wrote:

>
> Hi,
>
> Aren't keys in a map file unique? I'm surprised that you able to
> write such a file.
>
> Stefan
>
> On 27.11.2006, at 22:15, Feng Jiang wrote:
>
> > Hi all,
> >
> > For example, I have a MapFile, which is like:
> >
> > K -> V
> > 1 -> 1
> > 1 -> 2
> > 1 -> 3
> > 2 -> 1
> > 2 -> 2
> > 2 -> 3
> > 3 -> 1
> > 3 -> 2
> > 3 -> 3
> >
> > when i call mapFile.get(2, value), the value will be filled as 2,
> > not 1.
> >
> > Is is a bug of MapFile? I think the reader should be positioned at
> > the first
> > entry of the named key. am I right?
> >
> > Thanks and best regards,
> >
> > Feng Jiang
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 101tec Inc.
> search tech for web 2.1
> Menlo Park, California
> http://www.101tec.com
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: MapFile.get() has a bug?

Feng Jiang-2
Sorry, i made a miss spelling:)

I do have such a file. but i am concerning that why the reader is
NOTpositioned at the first entry of that named key?

On 11/28/06, Feng Jiang <[hidden email]> wrote:

>
> In the MapFile.Writer.checkKey() method, identical key is ok, unless you
> append a new key which is "less" than the last key.
>
> I did have such a file. but i am concerning that why the reader is
> positioned at the first entry of that named key?
>
> best wishes,
>
> Feng
>
> On 11/28/06, Stefan Groschupf <[hidden email]> wrote:
> >
> > Hi,
> >
> > Aren't keys in a map file unique? I'm surprised that you able to
> > write such a file.
> >
> > Stefan
> >
> > On 27.11.2006, at 22:15, Feng Jiang wrote:
> >
> > > Hi all,
> > >
> > > For example, I have a MapFile, which is like:
> > >
> > > K -> V
> > > 1 -> 1
> > > 1 -> 2
> > > 1 -> 3
> > > 2 -> 1
> > > 2 -> 2
> > > 2 -> 3
> > > 3 -> 1
> > > 3 -> 2
> > > 3 -> 3
> > >
> > > when i call mapFile.get (2, value), the value will be filled as 2,
> > > not 1.
> > >
> > > Is is a bug of MapFile? I think the reader should be positioned at
> > > the first
> > > entry of the named key. am I right?
> > >
> > > Thanks and best regards,
> > >
> > > Feng Jiang
> >
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > 101tec Inc.
> > search tech for web 2.1
> > Menlo Park, California
> > http://www.101tec.com
> >
> >
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: MapFile.get() has a bug?

Albert Chern
Well, I looked at the source and I can tell you WHY it happens, but
I'm not sure if the behavior is correct or not.  Basically the MapFile
keeps an index of where each key is; this index is how the MapFile
seeks quickly to the correct record.  However, there is a parameter
called the index interval controlling how many index entries there
are.  Every time the size of the map file hits a multiple of the index
interval, an index entry is written.  Therefore, it is possible that
an index entry is not added for the first occurrence of a key, but one
of the later ones.  The reader will then seek to one of those instead
of the first.

This does seem to be inconsistent with the the fact that you are
allowed to insert equal key records.  I suspect perhaps the developers
meant for MapFile records to be uniquely keyed, but in
MapFile.Writer.checkKey() they used a > where they intended a >= or
something.

On 11/27/06, Feng Jiang <[hidden email]> wrote:

> Sorry, i made a miss spelling:)
>
> I do have such a file. but i am concerning that why the reader is
> NOTpositioned at the first entry of that named key?
>
> On 11/28/06, Feng Jiang <[hidden email]> wrote:
> >
> > In the MapFile.Writer.checkKey() method, identical key is ok, unless you
> > append a new key which is "less" than the last key.
> >
> > I did have such a file. but i am concerning that why the reader is
> > positioned at the first entry of that named key?
> >
> > best wishes,
> >
> > Feng
> >
> > On 11/28/06, Stefan Groschupf <[hidden email]> wrote:
> > >
> > > Hi,
> > >
> > > Aren't keys in a map file unique? I'm surprised that you able to
> > > write such a file.
> > >
> > > Stefan
> > >
> > > On 27.11.2006, at 22:15, Feng Jiang wrote:
> > >
> > > > Hi all,
> > > >
> > > > For example, I have a MapFile, which is like:
> > > >
> > > > K -> V
> > > > 1 -> 1
> > > > 1 -> 2
> > > > 1 -> 3
> > > > 2 -> 1
> > > > 2 -> 2
> > > > 2 -> 3
> > > > 3 -> 1
> > > > 3 -> 2
> > > > 3 -> 3
> > > >
> > > > when i call mapFile.get (2, value), the value will be filled as 2,
> > > > not 1.
> > > >
> > > > Is is a bug of MapFile? I think the reader should be positioned at
> > > > the first
> > > > entry of the named key. am I right?
> > > >
> > > > Thanks and best regards,
> > > >
> > > > Feng Jiang
> > >
> > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > 101tec Inc.
> > > search tech for web 2.1
> > > Menlo Park, California
> > > http://www.101tec.com
> > >
> > >
> > >
> > >
> > >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: MapFile.get() has a bug?

Feng Jiang-2
Thanks, I understood what happened.

but is there any solution to work around it?

because one key has too large number of values, it is impossible to wrap all
the values into one Writable object. so i have to append (k1, v1), (k1,
v2).... and so on.

some idea?

Feng

On 11/28/06, Albert Chern <[hidden email]> wrote:

>
> Well, I looked at the source and I can tell you WHY it happens, but
> I'm not sure if the behavior is correct or not.  Basically the MapFile
> keeps an index of where each key is; this index is how the MapFile
> seeks quickly to the correct record.  However, there is a parameter
> called the index interval controlling how many index entries there
> are.  Every time the size of the map file hits a multiple of the index
> interval, an index entry is written.  Therefore, it is possible that
> an index entry is not added for the first occurrence of a key, but one
> of the later ones.  The reader will then seek to one of those instead
> of the first.
>
> This does seem to be inconsistent with the the fact that you are
> allowed to insert equal key records.  I suspect perhaps the developers
> meant for MapFile records to be uniquely keyed, but in
> MapFile.Writer.checkKey() they used a > where they intended a >= or
> something.
>
> On 11/27/06, Feng Jiang <[hidden email]> wrote:
> > Sorry, i made a miss spelling:)
> >
> > I do have such a file. but i am concerning that why the reader is
> > NOTpositioned at the first entry of that named key?
> >
> > On 11/28/06, Feng Jiang <[hidden email]> wrote:
> > >
> > > In the MapFile.Writer.checkKey() method, identical key is ok, unless
> you
> > > append a new key which is "less" than the last key.
> > >
> > > I did have such a file. but i am concerning that why the reader is
> > > positioned at the first entry of that named key?
> > >
> > > best wishes,
> > >
> > > Feng
> > >
> > > On 11/28/06, Stefan Groschupf <[hidden email]> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Aren't keys in a map file unique? I'm surprised that you able to
> > > > write such a file.
> > > >
> > > > Stefan
> > > >
> > > > On 27.11.2006, at 22:15, Feng Jiang wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > For example, I have a MapFile, which is like:
> > > > >
> > > > > K -> V
> > > > > 1 -> 1
> > > > > 1 -> 2
> > > > > 1 -> 3
> > > > > 2 -> 1
> > > > > 2 -> 2
> > > > > 2 -> 3
> > > > > 3 -> 1
> > > > > 3 -> 2
> > > > > 3 -> 3
> > > > >
> > > > > when i call mapFile.get (2, value), the value will be filled as 2,
> > > > > not 1.
> > > > >
> > > > > Is is a bug of MapFile? I think the reader should be positioned at
> > > > > the first
> > > > > entry of the named key. am I right?
> > > > >
> > > > > Thanks and best regards,
> > > > >
> > > > > Feng Jiang
> > > >
> > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > 101tec Inc.
> > > > search tech for web 2.1
> > > > Menlo Park, California
> > > > http://www.101tec.com
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: MapFile.get() has a bug?

Albert Chern
Maybe you could wrap the keys in a WritableComparable object that
combines the key with an integer, so you have something like:

( [k1, 0], v1 )
( [k1, 1], v2 )
( [k1, 2], v3 )

Then when you want to read the values for k1, look for [k1, 0], and
keep reading until the key is no longer k1.

On 11/28/06, Feng Jiang <[hidden email]> wrote:

> Thanks, I understood what happened.
>
> but is there any solution to work around it?
>
> because one key has too large number of values, it is impossible to wrap all
> the values into one Writable object. so i have to append (k1, v1), (k1,
> v2).... and so on.
>
> some idea?
>
> Feng
>
> On 11/28/06, Albert Chern <[hidden email]> wrote:
> >
> > Well, I looked at the source and I can tell you WHY it happens, but
> > I'm not sure if the behavior is correct or not.  Basically the MapFile
> > keeps an index of where each key is; this index is how the MapFile
> > seeks quickly to the correct record.  However, there is a parameter
> > called the index interval controlling how many index entries there
> > are.  Every time the size of the map file hits a multiple of the index
> > interval, an index entry is written.  Therefore, it is possible that
> > an index entry is not added for the first occurrence of a key, but one
> > of the later ones.  The reader will then seek to one of those instead
> > of the first.
> >
> > This does seem to be inconsistent with the the fact that you are
> > allowed to insert equal key records.  I suspect perhaps the developers
> > meant for MapFile records to be uniquely keyed, but in
> > MapFile.Writer.checkKey() they used a > where they intended a >= or
> > something.
> >
> > On 11/27/06, Feng Jiang <[hidden email]> wrote:
> > > Sorry, i made a miss spelling:)
> > >
> > > I do have such a file. but i am concerning that why the reader is
> > > NOTpositioned at the first entry of that named key?
> > >
> > > On 11/28/06, Feng Jiang <[hidden email]> wrote:
> > > >
> > > > In the MapFile.Writer.checkKey() method, identical key is ok, unless
> > you
> > > > append a new key which is "less" than the last key.
> > > >
> > > > I did have such a file. but i am concerning that why the reader is
> > > > positioned at the first entry of that named key?
> > > >
> > > > best wishes,
> > > >
> > > > Feng
> > > >
> > > > On 11/28/06, Stefan Groschupf <[hidden email]> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > Aren't keys in a map file unique? I'm surprised that you able to
> > > > > write such a file.
> > > > >
> > > > > Stefan
> > > > >
> > > > > On 27.11.2006, at 22:15, Feng Jiang wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > For example, I have a MapFile, which is like:
> > > > > >
> > > > > > K -> V
> > > > > > 1 -> 1
> > > > > > 1 -> 2
> > > > > > 1 -> 3
> > > > > > 2 -> 1
> > > > > > 2 -> 2
> > > > > > 2 -> 3
> > > > > > 3 -> 1
> > > > > > 3 -> 2
> > > > > > 3 -> 3
> > > > > >
> > > > > > when i call mapFile.get (2, value), the value will be filled as 2,
> > > > > > not 1.
> > > > > >
> > > > > > Is is a bug of MapFile? I think the reader should be positioned at
> > > > > > the first
> > > > > > entry of the named key. am I right?
> > > > > >
> > > > > > Thanks and best regards,
> > > > > >
> > > > > > Feng Jiang
> > > > >
> > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > 101tec Inc.
> > > > > search tech for web 2.1
> > > > > Menlo Park, California
> > > > > http://www.101tec.com
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: MapFile.get() has a bug?

Feng Jiang-2
Great idea!!! Thank you so much!!!

Best wishes,

Feng

On 11/28/06, Albert Chern <[hidden email]> wrote:

>
> Maybe you could wrap the keys in a WritableComparable object that
> combines the key with an integer, so you have something like:
>
> ( [k1, 0], v1 )
> ( [k1, 1], v2 )
> ( [k1, 2], v3 )
>
> Then when you want to read the values for k1, look for [k1, 0], and
> keep reading until the key is no longer k1.
>
> On 11/28/06, Feng Jiang <[hidden email]> wrote:
> > Thanks, I understood what happened.
> >
> > but is there any solution to work around it?
> >
> > because one key has too large number of values, it is impossible to wrap
> all
> > the values into one Writable object. so i have to append (k1, v1), (k1,
> > v2).... and so on.
> >
> > some idea?
> >
> > Feng
> >
> > On 11/28/06, Albert Chern <[hidden email]> wrote:
> > >
> > > Well, I looked at the source and I can tell you WHY it happens, but
> > > I'm not sure if the behavior is correct or not.  Basically the MapFile
> > > keeps an index of where each key is; this index is how the MapFile
> > > seeks quickly to the correct record.  However, there is a parameter
> > > called the index interval controlling how many index entries there
> > > are.  Every time the size of the map file hits a multiple of the index
> > > interval, an index entry is written.  Therefore, it is possible that
> > > an index entry is not added for the first occurrence of a key, but one
> > > of the later ones.  The reader will then seek to one of those instead
> > > of the first.
> > >
> > > This does seem to be inconsistent with the the fact that you are
> > > allowed to insert equal key records.  I suspect perhaps the developers
> > > meant for MapFile records to be uniquely keyed, but in
> > > MapFile.Writer.checkKey() they used a > where they intended a >= or
> > > something.
> > >
> > > On 11/27/06, Feng Jiang <[hidden email]> wrote:
> > > > Sorry, i made a miss spelling:)
> > > >
> > > > I do have such a file. but i am concerning that why the reader is
> > > > NOTpositioned at the first entry of that named key?
> > > >
> > > > On 11/28/06, Feng Jiang <[hidden email]> wrote:
> > > > >
> > > > > In the MapFile.Writer.checkKey() method, identical key is ok,
> unless
> > > you
> > > > > append a new key which is "less" than the last key.
> > > > >
> > > > > I did have such a file. but i am concerning that why the reader is
> > > > > positioned at the first entry of that named key?
> > > > >
> > > > > best wishes,
> > > > >
> > > > > Feng
> > > > >
> > > > > On 11/28/06, Stefan Groschupf <[hidden email]> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Aren't keys in a map file unique? I'm surprised that you able to
> > > > > > write such a file.
> > > > > >
> > > > > > Stefan
> > > > > >
> > > > > > On 27.11.2006, at 22:15, Feng Jiang wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > For example, I have a MapFile, which is like:
> > > > > > >
> > > > > > > K -> V
> > > > > > > 1 -> 1
> > > > > > > 1 -> 2
> > > > > > > 1 -> 3
> > > > > > > 2 -> 1
> > > > > > > 2 -> 2
> > > > > > > 2 -> 3
> > > > > > > 3 -> 1
> > > > > > > 3 -> 2
> > > > > > > 3 -> 3
> > > > > > >
> > > > > > > when i call mapFile.get (2, value), the value will be filled
> as 2,
> > > > > > > not 1.
> > > > > > >
> > > > > > > Is is a bug of MapFile? I think the reader should be
> positioned at
> > > > > > > the first
> > > > > > > entry of the named key. am I right?
> > > > > > >
> > > > > > > Thanks and best regards,
> > > > > > >
> > > > > > > Feng Jiang
> > > > > >
> > > > > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > > > > > 101tec Inc.
> > > > > > search tech for web 2.1
> > > > > > Menlo Park, California
> > > > > > http://www.101tec.com
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: MapFile.get() has a bug?

Doug Cutting
In reply to this post by Albert Chern
Albert Chern wrote:
> Every time the size of the map file hits a multiple of the index
> interval, an index entry is written.  Therefore, it is possible that
> an index entry is not added for the first occurrence of a key, but one
> of the later ones.  The reader will then seek to one of those instead
> of the first.
>
> This does seem to be inconsistent with the the fact that you are
> allowed to insert equal key records.

Yes, I agree that this is confusing and arguably a bug.

> I suspect perhaps the developers
> meant for MapFile records to be uniquely keyed, but in
> MapFile.Writer.checkKey() they used a > where they intended a >= or
> something.

I think what actually happened was that I originally coded it to
prohibit equal keys, then, at some point found an application (somewhere
in Nutch) where equal keys were useful, and changed MapFile to support
them, not realizing the consequences.  Sigh.  I don't know whether Nutch
still relies on this or not.

MapFile could probably be fixed by changing the way the index is
created, to write the location of the first instance of any run of equal
keys.  We could also avoid recording two instances of equal keys in the
index: for a long run of equal keys, we could wait until the key changes
before emitting a new index entry.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: MapFile.get() has a bug?

Feng Jiang-2
On 11/29/06, Doug Cutting <[hidden email]> wrote:

>
> Albert Chern wrote:
> > Every time the size of the map file hits a multiple of the index
> > interval, an index entry is written.  Therefore, it is possible that
> > an index entry is not added for the first occurrence of a key, but one
> > of the later ones.  The reader will then seek to one of those instead
> > of the first.
> >
> > This does seem to be inconsistent with the the fact that you are
> > allowed to insert equal key records.
>
> Yes, I agree that this is confusing and arguably a bug.
>
> > I suspect perhaps the developers
> > meant for MapFile records to be uniquely keyed, but in
> > MapFile.Writer.checkKey() they used a > where they intended a >= or
> > something.
>
> I think what actually happened was that I originally coded it to
> prohibit equal keys, then, at some point found an application (somewhere
> in Nutch) where equal keys were useful, and changed MapFile to support
> them, not realizing the consequences.  Sigh.  I don't know whether Nutch
> still relies on this or not.
>
> MapFile could probably be fixed by changing the way the index is
> created, to write the location of the first instance of any run of equal
> keys.  We could also avoid recording two instances of equal keys in the
> index: for a long run of equal keys, we could wait until the key changes
> before emitting a new index entry.


but the index interval is not real interval anymore. because even in the
interval, the index will be appended caused by equal keys. I think the
reason of existence of index interval is for reducing the index size when
the MapFile is too large.

I think we may introduce something like MultiValueMapFile, or IndexFile to
do the job, and leave MapFile keeping its own principle.

best regards,

Feng

Doug
>