Indexing a URL

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Indexing a URL

Bill Fowler-2
Hello,

I am trying to post the following to my index:

<field name="url">http://www.nytimes.com/2007/08/25/business/worldbusiness/25yuan.html?ex=1345694400&en=499af384a9ebd18f&ei=5088&partner=rssnyt&emc=rss
</field>

The url field is defined as:

   <field name="url" type="string" indexed="false" stored="true" />

However, I get the following error:

Posting file docstor/ffc110ee5c9a2ed28c8f35aa243bb53b.xml to
http://localhost:8983/news_feed/update
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 500 </title>
</head>
<body><h2>HTTP ERROR: 500</h2><pre>ParseError at [row,col]:[3,104]
Message: The reference to entity "en" must end with the ';' delimiter.

It is apparently attempting to parse &en=499af384a9ebd18f in the URL.  I am
not clear why it would do this as I specified indexed="false."  I need to
store this because that is how the user gets to the original article.

Is there any data type that simply ignores the characters in the field?  I
don't care that it can't be a search field.  I've tried the "ignored" field
type and it still gives me the same error.

Thanks,

Bill
Reply | Threaded
Open this post in threaded view
|

Re: Indexing a URL

Brian Whitman

> It is apparently attempting to parse &en=499af384a9ebd18f in the  
> URL.  I am
> not clear why it would do this as I specified indexed="false."  I  
> need to
> store this because that is how the user gets to the original article.

the ampersand is an XML reserved character. you have to escape it  
(turn it into &amp), whether you are indexing the data or not.  
Nothing to do w/ Solr, just xml files in general. Whatever you're  
using to render the xml should be able to handle this for you.


Reply | Threaded
Open this post in threaded view
|

Re: Indexing a URL

Bill Fowler-2
Thanks, Brian.

On 9/5/07, Brian Whitman <[hidden email]> wrote:

>
>
> > It is apparently attempting to parse &en=499af384a9ebd18f in the
> > URL.  I am
> > not clear why it would do this as I specified indexed="false."  I
> > need to
> > store this because that is how the user gets to the original article.
>
> the ampersand is an XML reserved character. you have to escape it
> (turn it into &amp), whether you are indexing the data or not.
> Nothing to do w/ Solr, just xml files in general. Whatever you're
> using to render the xml should be able to handle this for you.
>
>
>