Search for a specific unicode char

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Search for a specific unicode char

tedsolr
I'm having some trouble with non printable, but valid, UTF8 chars when
exporting to Amazon Redshift. The export fails but I can't yet find this
data in my Solr collection. How can I search, say from the admin console,
for a particular character? I'm looking for U+001E and U+001F

thanks!
Solr 5.5.4



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Search for a specific unicode char

tedsolr
This is an example of what the data looks like:

  "SOURCEFILEID":"77907",
        "APPROP_GROUP_CODE_T":"F\u0000G\u0000R",
        "APPROP_GROUP_CODE_T_aggr":"F\u0000G\u0000R",
        "APPROP_GROUP_CODE_T_search":"F\u0000G\u0000R",
        "OBJECT_DESC_T":"OTHER PROFESSIONAL/TECHNICAL SERVICES",

That's a snippet from a query results. "\u0000" is a null value. I don't
know why this data is presenting in this style. I still don't know how to
search for one unicode character. A search using the value as shown above
does work:
q=APPROP_GROUP_CODE_T:"F\u0000G\u0000R"



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Search for a specific unicode char

Christopher Schultz
In reply to this post by tedsolr
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

To whom it may concern,

On 7/31/18 2:56 PM, tedsolr wrote:
> I'm having some trouble with non printable, but valid, UTF8 chars
> when exporting to Amazon Redshift. The export fails but I can't yet
> find this data in my Solr collection. How can I search, say from
> the admin console, for a particular character? I'm looking for
> U+001E and U+001F

Try copy/pasting from e.g.
https://www.fileformat.info/info/unicode/char/001e/browsertest.htm

Or url-decode this string (%1e) here:
https://meyerweb.com/eric/tools/dencoder/

and paste it into your search box.

Do you have the source-data for the index? Maybe it's easier to locate
the character in the source-data than in the index.

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAltgzZoACgkQHPApP6U8
pFh5LQ//XEHKxGXd50kujey1H2i9SCoF0MYPIL255Mm/CXI2CEkHBiZnEEN7mrEH
xW87KbpKcahikEYT2fc/VDoctWtoJYpzi3WrizONNf1W7J4Nq9sSfdQ8UEDEuHy7
ITma15LkVseKmWxcFJP5rOtRatHw+L0j8EzwvYrC+BfpP7c9hqO8h4VO+9fkmSbn
5wB49kfot4quvJf4iMud+/qd6+4rLD1XR2nO1P7ZRuU7yqEGy5w9fLFNYkAVZmxR
1WXidEnAgLXxFoR061k0OwrxCwgVD0K/NqhzO5cWpmv5DbGoFiWcuOavzlOedp7u
ZPP32TuAM3PqmTpO6ku1MEsI70jVNlaRx6M1dzp6RUARFNEzLRbw93F3Vo9A34PL
94JhDaKMqbA74s2OdG+qNna7Fwe4mbIXMxUbwY80AC+1RMkEzRC/f1erNK1sfCzA
6cn06FNLuwbNhHvEpPAcS7TX0w0uhy4tCbbBt8rw0pbZDWee4Jz/aF7eRfMIiLdt
SlILSJZyte0CCMuC7Rm5qs/lpObfOaynVNSHpyPOJircqOyvYDy/UWq6C1t5/NuB
0X6vpBy/QSZhmmq7GHc6a8A6udDd8cfW1rXEt1vRcG9qnke1zSR7Trcb6n+GV19s
wooo3fHIsvU7393MHUZqAspaU20WqY9r9coNRHmje40Uj5ckFzU=
=NdlT
-----END PGP SIGNATURE-----