tweak to analysis.jsp for payload

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

tweak to analysis.jsp for payload

Tricia Williams-2
Hi,

    I think that displaying the payload (if one exists) of each token in
the analysis.jsp would be beneficial.  My simple solution was to add a
row to the existing table, convert the Payload byte array to a String
and simple print the results.  I opened SOLR-522 to this effect.

    There is a PayloadHelper class in Lucene that has decode/encode
float and int methods.  Any ideas on how Payloads might be uniformly
decoded into something readable/debugable from the gui?  I think bytes
to String will give enough of a clue to be helpful.

Tricia

Reply | Threaded
Open this post in threaded view
|

Re: tweak to analysis.jsp for payload

hossman

:    There is a PayloadHelper class in Lucene that has decode/encode float and
: int methods.  Any ideas on how Payloads might be uniformly decoded into
: something readable/debugable from the gui?  I think bytes to String will give
: enough of a clue to be helpful.

I've never really looked at PayloadHelper, but if i were tasked with
trying to find a way to display in HTML an arbitrary byte[] that may or
may not be a String, i would start by attempting a String conversion, if
that succeds *and* all chars in the resulting String are "printable" (
ie: Character.isDefined(c) && ! Character.isISOCOntrol(c)) then display
the first N chars (where N is some reasonable max size to display) ... if
not, then just display the first N characters of the hex string
representing the byte[].

It might be overkill, but the other possibility would be to add
<payloadInspector> config option to <fieldType> ... it could be a class
used solely for debugging purposes, and could be declared at arbitrary
points in the <tokenfilter> chain (indicating that from this point on,
this is how to display the payload) or completely outside of the
<analyzer> when using standalone Analyzers (or when the payload structure
is identicle for hte entire <tokenfilter> chain)


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: tweak to analysis.jsp for payload

Grant Ingersoll-2
As the guy who wrote PayloadHelper, what I really imagined was using  
Lucene's vint, etc. stuff, but that was a bit more refactoring wise.  
It can be handy for some payloads, but it is still on the app  
developer to know what was put in the payload.  What this means in  
terms of Solr is still up in the air.  No one has worked through what  
adding payloads means yet.


On Apr 4, 2008, at 8:48 PM, Chris Hostetter wrote:

>
> :    There is a PayloadHelper class in Lucene that has decode/encode  
> float and
> : int methods.  Any ideas on how Payloads might be uniformly decoded  
> into
> : something readable/debugable from the gui?  I think bytes to  
> String will give
> : enough of a clue to be helpful.
>
> I've never really looked at PayloadHelper, but if i were tasked with
> trying to find a way to display in HTML an arbitrary byte[] that may  
> or
> may not be a String, i would start by attempting a String  
> conversion, if
> that succeds *and* all chars in the resulting String are "printable" (
> ie: Character.isDefined(c) && ! Character.isISOCOntrol(c)) then  
> display
> the first N chars (where N is some reasonable max size to  
> display) ... if
> not, then just display the first N characters of the hex string
> representing the byte[].
>
> It might be overkill, but the other possibility would be to add
> <payloadInspector> config option to <fieldType> ... it could be a  
> class
> used solely for debugging purposes, and could be declared at arbitrary
> points in the <tokenfilter> chain (indicating that from this point on,
> this is how to display the payload) or completely outside of the
> <analyzer> when using standalone Analyzers (or when the payload  
> structure
> is identicle for hte entire <tokenfilter> chain)
>
>
> -Hoss
>

Reply | Threaded
Open this post in threaded view
|

Re: tweak to analysis.jsp for payload

Mike Klaas
In reply to this post by Tricia Williams-2
On 3-Apr-08, at 3:58 PM, Tricia Williams wrote:

> Hi,
>
>   I think that displaying the payload (if one exists) of each token  
> in the analysis.jsp would be beneficial.  My simple solution was to  
> add a row to the existing table, convert the Payload byte array to a  
> String and simple print the results.  I opened SOLR-522 to this  
> effect.
>   There is a PayloadHelper class in Lucene that has decode/encode  
> float and int methods.  Any ideas on how Payloads might be uniformly  
> decoded into something readable/debugable from the gui?  I think  
> bytes to String will give enough of a clue to be helpful.

Similarity.scorePayload(), if defined, should be the commonly-used  
method (at least, that's what I do):

   public float scorePayload(byte [] payload, int offset, int length) {
     assert length == 4;
     int accum = ((payload[0+offset]&0xff)) |
                 ((payload[1+offset]&0xff)<<8) |
                 ((payload[2+offset]&0xff)<<16)  |
                 ((payload[3+offset]&0xff)<<24);

     return Float.intBitsToFloat(accum);
}

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: tweak to analysis.jsp for payload

Grant Ingersoll-2
Yes, that is definitely the case, but I think Tricia was more getting  
at how to use them for display, i.e deserializing them into a String  
or whatever.  I still have on my plate that I want to figure out how  
to incorporate payloads with SpanQuery as that is the logical means of  
getting at them query wise.

-Grant

On Apr 5, 2008, at 4:51 AM, Mike Klaas wrote:

> On 3-Apr-08, at 3:58 PM, Tricia Williams wrote:
>> Hi,
>>
>>  I think that displaying the payload (if one exists) of each token  
>> in the analysis.jsp would be beneficial.  My simple solution was to  
>> add a row to the existing table, convert the Payload byte array to  
>> a String and simple print the results.  I opened SOLR-522 to this  
>> effect.
>>  There is a PayloadHelper class in Lucene that has decode/encode  
>> float and int methods.  Any ideas on how Payloads might be  
>> uniformly decoded into something readable/debugable from the gui?  
>> I think bytes to String will give enough of a clue to be helpful.
>
> Similarity.scorePayload(), if defined, should be the commonly-used  
> method (at least, that's what I do):
>
>  public float scorePayload(byte [] payload, int offset, int length) {
>    assert length == 4;
>    int accum = ((payload[0+offset]&0xff)) |
>                ((payload[1+offset]&0xff)<<8) |
>                ((payload[2+offset]&0xff)<<16)  |
>                ((payload[3+offset]&0xff)<<24);
>
>    return Float.intBitsToFloat(accum);
> }
>
> -Mike

Reply | Threaded
Open this post in threaded view
|

Re: tweak to analysis.jsp for payload

Tricia Williams-2
Replies to several comments in this thread inline:

Grant Ingersoll wrote:
> Yes, that is definitely the case, but I think Tricia was more getting
> at how to use them for display, i.e deserializing them into a String
> or whatever.  I still have on my plate that I want to figure out how
> to incorporate payloads with SpanQuery as that is the logical means of
> getting at them query wise.
>
> -Grant
>

Grant is right that my intention is to visualize the Payloads in the
same way that analysis.jsp allows users to visualize what TokenFilters
are doing to the position, term text, token type, and start and end
offsets.  This would be a crude way to debug or demo what your payload
savvy TokenFilter/Tokenizer does to a given TokenStream.

I went through the JIRA issues trying to figure out what was being done
with Payloads to see if this would help clarify my display problem.  I
came across Grant's AnalysisRequestHandler which looks like its intent
is to replace analysis.jsp at some point.  It looks like two short
months ago the call on including Payloads was to punt, "since Solr
doesn't currently support payloads, not much point in outputting them
just yet."  I guess that is what he was trying to tell me in this thread
too.

Grant Ingersoll wrote:
> As the guy who wrote PayloadHelper, what I really imagined was using
> Lucene's vint, etc. stuff, but that was a bit more refactoring wise.  
> It can be handy for some payloads, but it is still on the app
> developer to know what was put in the payload.  What this means in
> terms of Solr is still up in the air.  No one has worked through what
> adding payloads means yet.

Would it be completely ignorant of me to suggest that an abstraction of
Payload contain a public decode() method with an Object as a return
type?  Or maybe Payload's toString should be overridden to provide a
string representation for display -- possibly doing something like Hoss
described?

Chris Hostetter wrote:
> I've never really looked at PayloadHelper, but if i were tasked with
> trying to find a way to display in HTML an arbitrary byte[] that may or
> may not be a String, i would start by attempting a String conversion, if
> that succeds *and* all chars in the resulting String are "printable" (
> ie: Character.isDefined(c) && ! Character.isISOCOntrol(c)) then display
> the first N chars (where N is some reasonable max size to display) ... if
> not, then just display the first N characters of the hex string
> representing the byte[].
Thanks for the feedback.  It is always appreciated!

Tricia
Reply | Threaded
Open this post in threaded view
|

Re: tweak to analysis.jsp for payload

Grant Ingersoll-2
I don't know just yet that the AnalysisReqH (ARH) is going to replace  
analysis.jsp.  The JSP page does things that the ARH doesn't,  
specifically, handling the output after every token filter.  In my  
mind, the ARH is useful as a Token server for things like machine  
learning (i.e. Mahout :-)  ) and/or other applications that just have  
a need for the final tokens of a document.  I think the response would  
get pretty ugly looking if it were to try to serve up the intermediate  
tokens.  In other words, I have no intent on working on it, but if  
someone else comes up w/ a useful way of doing it, then I wouldn't try  
to stop it, either.

It might be useful to define a mechanism whereby one can plugin a  
Payload decoder into Solr that could be used by analysis.jsp.  This  
would allow applications a means to make sense of payloads and have  
them attached to tokens.

-Grant

On Apr 6, 2008, at 1:59 AM, Tricia Williams wrote:

> Replies to several comments in this thread inline:
>
> Grant Ingersoll wrote:
>> Yes, that is definitely the case, but I think Tricia was more  
>> getting at how to use them for display, i.e deserializing them into  
>> a String or whatever.  I still have on my plate that I want to  
>> figure out how to incorporate payloads with SpanQuery as that is  
>> the logical means of getting at them query wise.
>>
>> -Grant
>>
>
> Grant is right that my intention is to visualize the Payloads in the  
> same way that analysis.jsp allows users to visualize what  
> TokenFilters are doing to the position, term text, token type, and  
> start and end offsets.  This would be a crude way to debug or demo  
> what your payload savvy TokenFilter/Tokenizer does to a given  
> TokenStream.
>
> I went through the JIRA issues trying to figure out what was being  
> done with Payloads to see if this would help clarify my display  
> problem.  I came across Grant's AnalysisRequestHandler which looks  
> like its intent is to replace analysis.jsp at some point.  It looks  
> like two short months ago the call on including Payloads was to  
> punt, "since Solr doesn't currently support payloads, not much point  
> in outputting them just yet."  I guess that is what he was trying to  
> tell me in this thread too.
>
> Grant Ingersoll wrote:
>> As the guy who wrote PayloadHelper, what I really imagined was  
>> using Lucene's vint, etc. stuff, but that was a bit more  
>> refactoring wise.  It can be handy for some payloads, but it is  
>> still on the app developer to know what was put in the payload.  
>> What this means in terms of Solr is still up in the air.  No one  
>> has worked through what adding payloads means yet.
>
> Would it be completely ignorant of me to suggest that an abstraction  
> of Payload contain a public decode() method with an Object as a  
> return type?  Or maybe Payload's toString should be overridden to  
> provide a string representation for display -- possibly doing  
> something like Hoss described?
>
> Chris Hostetter wrote:
>> I've never really looked at PayloadHelper, but if i were tasked  
>> with trying to find a way to display in HTML an arbitrary byte[]  
>> that may or may not be a String, i would start by attempting a  
>> String conversion, if that succeds *and* all chars in the resulting  
>> String are "printable" ( ie: Character.isDefined(c) && !  
>> Character.isISOCOntrol(c)) then display the first N chars (where N  
>> is some reasonable max size to display) ... if not, then just  
>> display the first N characters of the hex string representing the  
>> byte[].
> Thanks for the feedback.  It is always appreciated!
>
> Tricia

Reply | Threaded
Open this post in threaded view
|

Re: tweak to analysis.jsp for payload

Yonik Seeley-2
As a useful first step for debugging purposes, it seems like the full
hex of the raw bytes should always be output.  If it seems to be
ascii, that could be put in parens.
example: 636f6f6c (cool)

This can be changed later as payloads gain the ability to be
introspected more fully by Solr.

-Yonik