updating resourcename (metadata) after renaming of the file

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

updating resourcename (metadata) after renaming of the file

Nan Yu
Hi,     I'm new to solr and my question might seems to be dumb. Please bear with me.
    In short, if a file (after indexed by solr) changed its name, could I only update resourcename and keep all other information the same in the index? Or do I have to delete it and then re-index the file? 
   Below are the detailed version of my question:    My problem: I have many (tens of thousands) files that are sort of messed up with file name/location:   (1) wrong file type in file name: for example,  some files are pdf file but are named as .txt.  (2) wrong directory of file: a file named 001.pdf should be under a specific folder, such as: abc/001.pdf. The "abc" could be found in 001.pdf. But now, this file might be placed under xyz/001.pdf. So, I would like to relocate the file from xyz/001.pdf to abc/001.pdf.
   I need to correct those errors and also being able to search the content of those documents.
   How I was trying to solve the issue using solr:(1) index all files(2) from content_type and/or application_name, I could know the correct file type (pdf, text etc.) and compare it with the resourcename. Found out those files that I need to update.(3) query all "abc", "xyz" etc. and find out those files that need to be relocated.(4) rename/relocate files from step (2) and (3)(5) update solr index.
    My question:I tried to use the solr UI to update resourcename. I was able to do so but the problem is the id now has a new version that only contains resourcename.
Here is how I made the update:
In the "Documents" page: Request-Handler: /updateDocument Type: JSON:Document(s):  {"id":"/testproject/abc/001.txt",  "resourcename":{"set":["/testproject/abc/001.pdf"]}}

After submission, and query/browse the file, the only thing left is resourcename. All other information, such as content_type, created etc. are gone.       If I could not update the " resourcename" (and keep all other information), it means that I either run a script to delete the document that I renamed/relocated and re-index them, meaning that myscript will do: find the file, rename/relocate, delete the original index, re-index/post the new file. Is this the best approach?
Thanks!Nan      
Reply | Threaded
Open this post in threaded view
|

Re: updating resourcename (metadata) after renaming of the file

Ray Niu
I think this is the one you are looking for:
http://yonik.com/solr/atomic-updates/

On Wed, Dec 4, 2019 at 9:40 AM Nan Yu <[hidden email]> wrote:

> Hi,     I'm new to solr and my question might seems to be dumb. Please
> bear with me.
>     In short, if a file (after indexed by solr) changed its name, could I
> only update resourcename and keep all other information the same in the
> index? Or do I have to delete it and then re-index the file?
>    Below are the detailed version of my question:    My problem: I have
> many (tens of thousands) files that are sort of messed up with file
> name/location:   (1) wrong file type in file name: for example,  some files
> are pdf file but are named as .txt.  (2) wrong directory of file: a file
> named 001.pdf should be under a specific folder, such as: abc/001.pdf. The
> "abc" could be found in 001.pdf. But now, this file might be placed under
> xyz/001.pdf. So, I would like to relocate the file from xyz/001.pdf to
> abc/001.pdf.
>    I need to correct those errors and also being able to search the
> content of those documents.
>    How I was trying to solve the issue using solr:(1) index all files(2)
> from content_type and/or application_name, I could know the correct file
> type (pdf, text etc.) and compare it with the resourcename. Found out those
> files that I need to update.(3) query all "abc", "xyz" etc. and find out
> those files that need to be relocated.(4) rename/relocate files from step
> (2) and (3)(5) update solr index.
>     My question:I tried to use the solr UI to update resourcename. I was
> able to do so but the problem is the id now has a new version that only
> contains resourcename.
> Here is how I made the update:
> In the "Documents" page: Request-Handler: /updateDocument Type:
> JSON:Document(s):  {"id":"/testproject/abc/001.txt",
> "resourcename":{"set":["/testproject/abc/001.pdf"]}}
>
> After submission, and query/browse the file, the only thing left
> is resourcename. All other information, such as content_type, created etc.
> are gone.       If I could not update the " resourcename" (and keep all
> other information), it means that I either run a script to delete the
> document that I renamed/relocated and re-index them, meaning that myscript
> will do: find the file, rename/relocate, delete the original index,
> re-index/post the new file. Is this the best approach?
> Thanks!Nan
Reply | Threaded
Open this post in threaded view
|

Re: updating resourcename (metadata) after renaming of the file

Nan Yu
 Hi, Rui Niu:Thanks for your help!
I tried that but it did not work for the "resourcename".
The reason might be the "resourcename" is a "metadata" field that generated by Tika (SimplePostTool). Not a user specified field (such as "author_s" in the link you provided).
The method you recommended does work on a normal field. For example, in the example techproducts core, if I run:curl http://localhost:8983/solr/techproducts/update?commitWithin=1000 -d '[{"id":"GB18030TEST","price":{"set":"5.6"}}]'
I could see the price updated from 0.0 to 5.6.
Nan 

    On Wednesday, December 4, 2019, 12:45:39 PM EST, Rui Niu <[hidden email]> wrote:  
 
 I think this is the one you are looking for:
http://yonik.com/solr/atomic-updates/

On Wed, Dec 4, 2019 at 9:40 AM Nan Yu <[hidden email]> wrote:

> Hi,    I'm new to solr and my question might seems to be dumb. Please
> bear with me.
>    In short, if a file (after indexed by solr) changed its name, could I
> only update resourcename and keep all other information the same in the
> index? Or do I have to delete it and then re-index the file?
>    Below are the detailed version of my question:    My problem: I have
> many (tens of thousands) files that are sort of messed up with file
> name/location:  (1) wrong file type in file name: for example,  some files
> are pdf file but are named as .txt.  (2) wrong directory of file: a file
> named 001.pdf should be under a specific folder, such as: abc/001.pdf. The
> "abc" could be found in 001.pdf. But now, this file might be placed under
> xyz/001.pdf. So, I would like to relocate the file from xyz/001.pdf to
> abc/001.pdf.
>    I need to correct those errors and also being able to search the
> content of those documents.
>    How I was trying to solve the issue using solr:(1) index all files(2)
> from content_type and/or application_name, I could know the correct file
> type (pdf, text etc.) and compare it with the resourcename. Found out those
> files that I need to update.(3) query all "abc", "xyz" etc. and find out
> those files that need to be relocated.(4) rename/relocate files from step
> (2) and (3)(5) update solr index.
>    My question:I tried to use the solr UI to update resourcename. I was
> able to do so but the problem is the id now has a new version that only
> contains resourcename.
> Here is how I made the update:
> In the "Documents" page: Request-Handler: /updateDocument Type:
> JSON:Document(s):  {"id":"/testproject/abc/001.txt",
> "resourcename":{"set":["/testproject/abc/001.pdf"]}}
>
> After submission, and query/browse the file, the only thing left
> is resourcename. All other information, such as content_type, created etc.
> are gone.      If I could not update the " resourcename" (and keep all
> other information), it means that I either run a script to delete the
> document that I renamed/relocated and re-index them, meaning that myscript
> will do: find the file, rename/relocate, delete the original index,
> re-index/post the new file. Is this the best approach?
> Thanks!Nan