how to exclude path from being queried

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

how to exclude path from being queried

Nan Yu
Hi, 
    I am trying to find all files containing a keyword in a directory (and many sub-directories).
   
    I did a quick indexing using 

bin/post -c myCore /RootDir

    When I query the index using "keyword", all files whose path containing the keyword will be included in the search result. For example: /RootDir/KeywordReports/FileDoesNotContainKeyword.txt will be shown in the query result. 
     The query is: http://localhost:8983/solr/myCore/select?q=keyword
  
    Is there a way to exclude files whose content does not contain the keyword but the path contains the keyword?
    Should I re-index the directory using some extra parameter? Or use extra condition in the query 


Thanks!
Nan 

Reply | Threaded
Open this post in threaded view
|

Re: how to exclude path from being queried

Paras Lehana
Hi Nan,

Are you using PathHierarchyTokenizer
<https://lucene.apache.org/solr/guide/8_3/tokenizers.html#Tokenizers-PathHierarchyTokenizer>
?

On Thu, 19 Dec 2019 at 01:51, Nan Yu <[hidden email]> wrote:

> Hi,
>     I am trying to find all files containing a keyword in a directory (and
> many sub-directories).
>
>     I did a quick indexing using
>
> bin/post -c myCore /RootDir
>
>     When I query the index using "keyword", all files whose path
> containing the keyword will be included in the search result. For example:
> /RootDir/KeywordReports/FileDoesNotContainKeyword.txt will be shown in the
> query result.
>      The query is: http://localhost:8983/solr/myCore/select?q=keyword
>
>     Is there a way to exclude files whose content does not contain the
> keyword but the path contains the keyword?
>     Should I re-index the directory using some extra parameter? Or use
> extra condition in the query
>
>
> Thanks!
> Nan
>
>

--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

--
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>
Reply | Threaded
Open this post in threaded view
|

Re: how to exclude path from being queried

Shawn Heisey-2
In reply to this post by Nan Yu
On 12/18/2019 1:21 PM, Nan Yu wrote:

>      I am trying to find all files containing a keyword in a directory (and many sub-directories).
>    
>      I did a quick indexing using
>
> bin/post -c myCore /RootDir
>
>      When I query the index using "keyword", all files whose path containing the keyword will be included in the search result. For example: /RootDir/KeywordReports/FileDoesNotContainKeyword.txt will be shown in the query result.
>       The query is: http://localhost:8983/solr/myCore/select?q=keyword
>    
>      Is there a way to exclude files whose content does not contain the keyword but the path contains the keyword?
>      Should I re-index the directory using some extra parameter? Or use extra condition in the query

It sounds like your default field is probably a catchall which has the
contents of multiple source fields copied to it, including the content
and the filename.

If you do not want the filename searched, then query a different field
which does not contain that information.  You may need to adjust your
schema and reindex for this to be possible.

You haven't shared the configs for this index, so it is not possible for
us to confirm that guess.

Thanks,
Shawn