Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)

王建新
用英文可能描述得不是很清楚,不好意思:)


----- Original Message -----
From: 王建新
To: Chris
Sent: Tuesday, April 22, 2008 9:52 AM
Subject: Re: Need addtional info for Field


谢谢。
我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
    Field("Sub","下午去开会","01:02:02");
    Field("Sub","后天去开会","01:03:05");
    [注:"01:02:02"是附属的时间,lucene没有提供这种用法。]

这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕“下午去开会”,在01:03:05时间出现了“后天去开会”,如果用户(User)搜索“下午”,当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间“01:02:02”。如果用户搜索“开会”,则两个Field都可以匹配到。因此需要知道时间“01:02:02”和“01:03:05”。
不知道我有没有说清楚。

我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?

王建新
  ----- Original Message -----
  From: Chris
  To: 王建新
  Sent: Monday, April 21, 2008 7:34 PM
  Subject: Re: Need addtional info for Field


  您的功能可以再清楚一�c�幔�因�槠���@�犹�理,好像要�嘣~....

  但看到您�]�啵�而且�谖幻��Q一�邮� multi-pair 值的�,不是用 String 存哦

  以上
                     Chris.

   
  2008/4/21, 王建新 <[hidden email]>:
    你看得懂中文吗?

    我不是很明白你的意思。
    你是说可以用lucene现有的功能来解决这个问题吗?

      ----- Original Message -----
      From: Chris
      To: 王建新
      Sent: Monday, April 21, 2008 5:14 PM
      Subject: Re: Need addtional info for Field

       
      This problem is not solve with lucene but or method will solve it.

      The structure is not define as this as well ......

      You may check it clear....

      above
                     Chris.

       
      2008/4/21, 王建新 <[hidden email]>:
        hi Chris, it is me "王建新"

        I have a new problem, Could you give me any advice? Thank you.


        I want to use lucene with some additional info,like:

        1.index
            Document additionalDoc=ew Document()

            additionalDoc.add(new Field("field","AA BB","Addtional info ..............."));
            additionalDoc.add(new Field("field","BB CC","Addtional info 222222222222222222222222..............."));

            writer.addDocument(additionalDoc)

            ........


        2. search

            Searcher searcher;
            ....

            searcher.search(termQuery("field","BB"));




            in this condition, I want lucene returns the additionalDoc , also know which fileds were matched, then I will get the additional info from the matched fields.

        Can lucene make it in version 2.3.1?



      --
      Chris Lin
      [hidden email]
      Taipei , Taiwan.
      -----------------------------------------------------------



  --
  Chris Lin
  [hidden email]
  Taipei , Taiwan.
  -----------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

RE: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)

Fang_Li
Try to use payload which is stored as additional information. Currently lucene only support per token payload, but you can add an arbitrary token for the time information.

I am not sure what are the query information? Only the subtitle or both subtitle and time?

Regards,

-----Original Message-----
From: 王建新 [mailto:[hidden email]]
Sent: Tuesday, April 22, 2008 1:06 PM
To: java-user
Subject: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)

用英文可能描述得不是很清楚,不好意思:)


----- Original Message -----
From: 王建新
To: Chris
Sent: Tuesday, April 22, 2008 9:52 AM
Subject: Re: Need addtional info for Field


谢谢。
我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
    Field("Sub","下午去开会","01:02:02");
    Field("Sub","后天去开会","01:03:05");
    [注:"01:02:02"是附属的时间,lucene没有提供这种用法。]

这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕“下午去开会”,在01:03:05时间出现了“后天去开会”,如果用户(User)搜索“下午”,当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间“01:02:02”。如果用户搜索“开会”,则两个Field都可以匹配到。因此需要知道时间“01:02:02”和“01:03:05”。
不知道我有没有说清楚。

我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?

王建新
  ----- Original Message -----
  From: Chris
  To: 王建新
  Sent: Monday, April 21, 2008 7:34 PM
  Subject: Re: Need addtional info for Field


  您的功能可以再清楚一�c�幔�因�槠���@�犹�理,好像要�嘣~....

  但看到您�]�啵�而且�谖幻��Q一�邮� multi-pair 值的�,不是用 String 存哦

  以上
                     Chris.

   
  2008/4/21, 王建新 <[hidden email]>:
    你看得懂中文吗?

    我不是很明白你的意思。
    你是说可以用lucene现有的功能来解决这个问题吗?

      ----- Original Message -----
      From: Chris
      To: 王建新
      Sent: Monday, April 21, 2008 5:14 PM
      Subject: Re: Need addtional info for Field

       
      This problem is not solve with lucene but or method will solve it.

      The structure is not define as this as well ......

      You may check it clear....

      above
                     Chris.

       
      2008/4/21, 王建新 <[hidden email]>:
        hi Chris, it is me "王建新"

        I have a new problem, Could you give me any advice? Thank you.


        I want to use lucene with some additional info,like:

        1.index
            Document additionalDoc=ew Document()

            additionalDoc.add(new Field("field","AA BB","Addtional info ..............."));
            additionalDoc.add(new Field("field","BB CC","Addtional info 222222222222222222222222..............."));

            writer.addDocument(additionalDoc)

            ........


        2. search

            Searcher searcher;
            ....

            searcher.search(termQuery("field","BB"));




            in this condition, I want lucene returns the additionalDoc , also know which fileds were matched, then I will get the additional info from the matched fields.

        Can lucene make it in version 2.3.1?



      --
      Chris Lin
      [hidden email]
      Taipei , Taiwan.
      -----------------------------------------------------------



  --
  Chris Lin
  [hidden email]
  Taipei , Taiwan.
  -----------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)

王建新

谢谢,我只是检索sub,不检索时间,在检索sub时,只想得到匹配Field对应的时间。
用payload似乎不可以?

----- Original Message -----
From: <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, April 22, 2008 1:55 PM
Subject: RE: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)


Try to use payload which is stored as additional information. Currently lucene only support per token payload, but you can add an arbitrary token for the time information.

I am not sure what are the query information? Only the subtitle or both subtitle and time?

Regards,

-----Original Message-----
From: 王建新 [mailto:[hidden email]]
Sent: Tuesday, April 22, 2008 1:06 PM
To: java-user
Subject: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)

用英文可能描述得不是很清楚,不好意思:)


----- Original Message -----
From: 王建新
To: Chris
Sent: Tuesday, April 22, 2008 9:52 AM
Subject: Re: Need addtional info for Field


谢谢。
我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
    Field("Sub","下午去开会","01:02:02");
    Field("Sub","后天去开会","01:03:05");
    [注:"01:02:02"是附属的时间,lucene没有提供这种用法。]

这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕“下午去开会”,在01:03:05时间出现了“后天去开会”,如果用户(User)搜索“下午”,当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间“01:02:02”。如果用户搜索“开会”,则两个Field都可以匹配到。因此需要知道时间“01:02:02”和“01:03:05”。
不知道我有没有说清楚。

我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?

王建新
  ----- Original Message -----
  From: Chris
  To: 王建新
  Sent: Monday, April 21, 2008 7:34 PM
  Subject: Re: Need addtional info for Field


  您的功能可以再清楚一�c�幔�因�槠���@�犹�理,好像要�嘣~....

  但看到您�]�啵�而且�谖幻��Q一�邮� multi-pair 值的�,不是用 String 存哦

  以上
                     Chris.

   
  2008/4/21, 王建新 <[hidden email]>:
    你看得懂中文吗?

    我不是很明白你的意思。
    你是说可以用lucene现有的功能来解决这个问题吗?

      ----- Original Message -----
      From: Chris
      To: 王建新
      Sent: Monday, April 21, 2008 5:14 PM
      Subject: Re: Need addtional info for Field

       
      This problem is not solve with lucene but or method will solve it.

      The structure is not define as this as well ......

      You may check it clear....

      above
                     Chris.

       
      2008/4/21, 王建新 <[hidden email]>:
        hi Chris, it is me "王建新"

        I have a new problem, Could you give me any advice? Thank you.


        I want to use lucene with some additional info,like:

        1.index
            Document additionalDoc=ew Document()

            additionalDoc.add(new Field("field","AA BB","Addtional info ..............."));
            additionalDoc.add(new Field("field","BB CC","Addtional info 222222222222222222222222..............."));

            writer.addDocument(additionalDoc)

            ........


        2. search

            Searcher searcher;
            ....

            searcher.search(termQuery("field","BB"));




            in this condition, I want lucene returns the additionalDoc , also know which fileds were matched, then I will get the additional info from the matched fields.

        Can lucene make it in version 2.3.1?



      --
      Chris Lin
      [hidden email]
      Taipei , Taiwan.
      -----------------------------------------------------------



  --
  Chris Lin
  [hidden email]
  Taipei , Taiwan.
  -----------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)

Cedric Ho
In that case you may want to index each:

Field("Sub","下午去开会","01:02:02");

as a separate document. So your document contains 3 fields
1. title
2. time
3. sub

then you can get both title and time by searching the "sub" field.

Cedric


2008/4/22 王建新 <[hidden email]>:

>
>  谢谢,我只是检索sub,不检索时间,在检索sub时,只想得到匹配Field对应的时间。
>  用payload似乎不可以?
>
>
>
>  ----- Original Message -----
>  From: <[hidden email]>
>  To: <[hidden email]>
>  Sent: Tuesday, April 22, 2008 1:55 PM
>  Subject: RE: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>
>
>  Try to use payload which is stored as additional information. Currently lucene only support per token payload, but you can add an arbitrary token for the time information.
>
>  I am not sure what are the query information? Only the subtitle or both subtitle and time?
>
>  Regards,
>
>  -----Original Message-----
>  From: 王建新 [mailto:[hidden email]]
>  Sent: Tuesday, April 22, 2008 1:06 PM
>  To: java-user
>  Subject: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>
>  用英文可能描述得不是很清楚,不好意思:)
>
>
>  ----- Original Message -----
>  From: 王建新
>  To: Chris
>  Sent: Tuesday, April 22, 2008 9:52 AM
>  Subject: Re: Need addtional info for Field
>
>
>  谢谢。
>  我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
>  在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
>     Field("Sub","下午去开会","01:02:02");
>     Field("Sub","后天去开会","01:03:05");
>     [注:"01:02:02"是附属的时间,lucene没有提供这种用法。]
>
>  这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕"下午去开会",在01:03:05时间出现了"后天去开会",如果用户(User)搜索"下午",当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间"01:02:02"。如果用户搜索"开会",则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
>  不知道我有没有说清楚。
>
>  我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?
>
>  王建新
>   ----- Original Message -----
>   From: Chris
>   To: 王建新
>   Sent: Monday, April 21, 2008 7:34 PM
>   Subject: Re: Need addtional info for Field
>
>
>   您的功能可以再清楚一點嗎,因為其實這樣處理,好像要斷詞....
>
>   但看到您沒斷,而且欄位名稱一樣是 multi-pair 值的話,不是用 String 存哦
>
>   以上
>                      Chris.
>
>
>   2008/4/21, 王建新 <[hidden email]>:
>     你看得懂中文吗?
>
>     我不是很明白你的意思。
>     你是说可以用lucene现有的功能来解决这个问题吗?
>
>       ----- Original Message -----
>       From: Chris
>       To: 王建新
>       Sent: Monday, April 21, 2008 5:14 PM
>       Subject: Re: Need addtional info for Field
>
>
>       This problem is not solve with lucene but or method will solve it.
>
>       The structure is not define as this as well ......
>
>       You may check it clear....
>
>       above
>                      Chris.
>
>
>       2008/4/21, 王建新 <[hidden email]>:
>         hi Chris, it is me "王建新"
>
>         I have a new problem, Could you give me any advice? Thank you.
>
>
>         I want to use lucene with some additional info,like:
>
>         1.index
>             Document additionalDoc=ew Document()
>
>             additionalDoc.add(new Field("field","AA BB","Addtional info ..............."));
>             additionalDoc.add(new Field("field","BB CC","Addtional info 222222222222222222222222..............."));
>
>             writer.addDocument(additionalDoc)
>
>             ........
>
>
>         2. search
>
>             Searcher searcher;
>             ....
>
>             searcher.search(termQuery("field","BB"));
>
>
>
>
>             in this condition, I want lucene returns the additionalDoc , also know which fileds were matched, then I will get the additional info from the matched fields.
>
>         Can lucene make it in version 2.3.1?
>
>
>
>       --
>       Chris Lin
>       [hidden email]
>       Taipei , Taiwan.
>       -----------------------------------------------------------
>
>
>
>   --
>   Chris Lin
>   [hidden email]
>   Taipei , Taiwan.
>   -----------------------------------------------------------
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: [hidden email]
>  For additional commands, e-mail: [hidden email]
>
Reply | Threaded
Open this post in threaded view
|

Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)

kai.hu
你只要索引并分词“下午去开会”就行了,把对应的时间保存进去。
如document.add(new Field("sub","下午去开会",Field.Store.YES,Field.Index.TOKENIZED));
document.add(new
Field("time","01:02:02",Field.Store.YES,Field.Index.UN_TOKENIZED));
到时候搜索出的单个document里就包含这两个Field了。

only index and tokenized "下午去开会",and store the time with this sub.

--------------------------------------------------
From: "Cedric Ho" <[hidden email]>
Sent: Tuesday, April 22, 2008 3:36 PM
To: <[hidden email]>
Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)


> In that case you may want to index each:
>
> Field("Sub","下午去开会","01:02:02");
>
> as a separate document. So your document contains 3 fields
> 1. title
> 2. time
> 3. sub
>
> then you can get both title and time by searching the "sub" field.
>
> Cedric
>
>
> 2008/4/22 王建新 <[hidden email]>:
>>
>>  谢谢,我只是检索sub,不检索时间,在检索sub时,只想得到匹配Field对应的时间。
>>
>>  用payload似乎不可以?
>>
>>
>>
>>  ----- Original Message -----
>>  From: <[hidden email]>
>>  To: <[hidden email]>
>>  Sent: Tuesday, April 22, 2008 1:55 PM
>>  Subject: RE: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>
>>
>>  Try to use payload which is stored as additional information. Currently
>> lucene only support per token payload, but you can add an arbitrary token
>> for the time information.
>>
>>  I am not sure what are the query information? Only the subtitle or both
>> subtitle and time?
>>
>>  Regards,
>>
>>  -----Original Message-----
>>  From: 王建新 [mailto:[hidden email]]
>>  Sent: Tuesday, April 22, 2008 1:06 PM
>>  To: java-user
>>  Subject: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>
>>
>>  用英文可能描述得不是很清楚,不好意思:)
>>
>>
>>  ----- Original Message -----
>>  From: 王建新
>>  To: Chris
>>  Sent: Tuesday, April 22, 2008 9:52 AM
>>  Subject: Re: Need addtional info for Field
>>
>>
>>  谢谢。
>>  我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
>>  在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
>>     Field("Sub","下午去开会","01:02:02");
>>     Field("Sub","后天去开会","01:03:05");
>>     [注:"01:02:02"是附属的时间,lucene没有提供这种用法。]
>>
>>  这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕"下午去开会",在01:03:05时间出现了"后天去开会",如果用户(User)搜索"下午",当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间"01:02:02"。如果用户搜索"开会",则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
>>  不知道我有没有说清楚。
>>
>>  我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?
>>
>>  王建新
>>   ----- Original Message -----
>>   From: Chris
>>   To: 王建新
>>   Sent: Monday, April 21, 2008 7:34 PM
>>   Subject: Re: Need addtional info for Field
>>
>>
>>   您的功能可以再清楚一點嗎,因為其實這樣處理,好像要斷詞....
>>
>>   但看到您沒斷,而且欄位名稱一樣是 multi-pair 值的話,不是用 String 存哦
>>
>>   以上
>>                      Chris.
>>
>>
>>   2008/4/21, 王建新 <[hidden email]>:
>>     你看得懂中文吗?
>>
>>     我不是很明白你的意思。
>>     你是说可以用lucene现有的功能来解决这个问题吗?
>>
>>       ----- Original Message -----
>>       From: Chris
>>       To: 王建新
>>       Sent: Monday, April 21, 2008 5:14 PM
>>       Subject: Re: Need addtional info for Field
>>
>>
>>       This problem is not solve with lucene but or method will solve it.
>>
>>       The structure is not define as this as well ......
>>
>>       You may check it clear....
>>
>>       above
>>                      Chris.
>>
>>
>>       2008/4/21, 王建新 <[hidden email]>:
>>         hi Chris, it is me "王建新"
>>
>>         I have a new problem, Could you give me any advice? Thank you.
>>
>>
>>         I want to use lucene with some additional info,like:
>>
>>         1.index
>>             Document additionalDoc=ew Document()
>>
>>             additionalDoc.add(new Field("field","AA BB","Addtional info
>> ..............."));
>>             additionalDoc.add(new Field("field","BB CC","Addtional info
>> 222222222222222222222222..............."));
>>
>>             writer.addDocument(additionalDoc)
>>
>>             ........
>>
>>
>>         2. search
>>
>>             Searcher searcher;
>>             ....
>>
>>             searcher.search(termQuery("field","BB"));
>>
>>
>>
>>
>>             in this condition, I want lucene returns the additionalDoc ,
>> also know which fileds were matched, then I will get the additional info
>> from the matched fields.
>>
>>         Can lucene make it in version 2.3.1?
>>
>>
>>
>>       --
>>       Chris Lin
>>       [hidden email]
>>       Taipei , Taiwan.
>>       -----------------------------------------------------------
>>
>>
>>
>>   --
>>   Chris Lin
>>   [hidden email]
>>   Taipei , Taiwan.
>>   -----------------------------------------------------------
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: [hidden email]
>>  For additional commands, e-mail: [hidden email]
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)

kai.hu
在google里搜一下中文分词,出车东的包外,应该还有很多了,如果你发现有更好分词,更高效率的,也推荐一份啊。

--------------------------------------------------
From: "kai.hu" <[hidden email]>
Sent: Sunday, May 04, 2008 4:20 PM
To: <[hidden email]>
Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)


> 你只要索引并分词“下午去开会”就行了,把对应的时间保存进去。
> 如document.add(new Field("sub","下午去开会",Field.Store.YES,Field.Index.TOKENIZED));
> document.add(new
> Field("time","01:02:02",Field.Store.YES,Field.Index.UN_TOKENIZED));
> 到时候搜索出的单个document里就包含这两个Field了。
>
> only index and tokenized "下午去开会",and store the time with this sub.
>
> --------------------------------------------------
> From: "Cedric Ho" <[hidden email]>
> Sent: Tuesday, April 22, 2008 3:36 PM
> To: <[hidden email]>
> Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>
>
>> In that case you may want to index each:
>>
>> Field("Sub","下午去开会","01:02:02");
>>
>> as a separate document. So your document contains 3 fields
>> 1. title
>> 2. time
>> 3. sub
>>
>> then you can get both title and time by searching the "sub" field.
>>
>> Cedric
>>
>>
>> 2008/4/22 王建新 <[hidden email]>:
>>>
>>>  谢谢,我只是检索sub,不检索时间,在检索sub时,只想得到匹配Field对应的时间。
>>>  用payload似乎不可以?
>>>
>>>
>>>
>>>  ----- Original Message -----
>>>  From: <[hidden email]>
>>>  To: <[hidden email]>
>>>  Sent: Tuesday, April 22, 2008 1:55 PM
>>>  Subject: RE: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>>
>>>
>>>  Try to use payload which is stored as additional information. Currently
>>> lucene only support per token payload, but you can add an arbitrary
>>> token for the time information.
>>>
>>>  I am not sure what are the query information? Only the subtitle or both
>>> subtitle and time?
>>>
>>>  Regards,
>>>
>>>  -----Original Message-----
>>>  From: 王建新 [mailto:[hidden email]]
>>>  Sent: Tuesday, April 22, 2008 1:06 PM
>>>  To: java-user
>>>  Subject: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>>
>>>  用英文可能描述得不是很清楚,不好意思:)
>>>
>>>
>>>  ----- Original Message -----
>>>  From: 王建新
>>>  To: Chris
>>>  Sent: Tuesday, April 22, 2008 9:52 AM
>>>  Subject: Re: Need addtional info for Field
>>>
>>>
>>>  谢谢。
>>>  我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
>>>  在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
>>>     Field("Sub","下午去开会","01:02:02");
>>>     Field("Sub","后天去开会","01:03:05");
>>>     [注:"01:02:02"是附属的时间,lucene没有提供这种用法。]
>>>
>>>  这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕"下午去开会",在01:03:05时间出现了"后天去开会",如果用户(User)搜索"下午",当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间"01:02:02"。如果用户搜索"开会",则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
>>>  不知道我有没有说清楚。
>>>
>>>  我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?
>>>
>>>  王建新
>>>   ----- Original Message -----
>>>   From: Chris
>>>   To: 王建新
>>>   Sent: Monday, April 21, 2008 7:34 PM
>>>   Subject: Re: Need addtional info for Field
>>>
>>>
>>>   您的功能可以再清楚一點嗎,因為其實這樣處理,好像要斷詞....
>>>
>>>   但看到您沒斷,而且欄位名稱一樣是 multi-pair 值的話,不是用 String 存哦
>>>
>>>   以上
>>>                      Chris.
>>>
>>>
>>>   2008/4/21, 王建新 <[hidden email]>:
>>>     你看得懂中文吗?
>>>
>>>     我不是很明白你的意思。
>>>     你是说可以用lucene现有的功能来解决这个问题吗?
>>>
>>>       ----- Original Message -----
>>>       From: Chris
>>>       To: 王建新
>>>       Sent: Monday, April 21, 2008 5:14 PM
>>>       Subject: Re: Need addtional info for Field
>>>
>>>
>>>       This problem is not solve with lucene but or method will solve it.
>>>
>>>       The structure is not define as this as well ......
>>>
>>>       You may check it clear....
>>>
>>>       above
>>>                      Chris.
>>>
>>>
>>>       2008/4/21, 王建新 <[hidden email]>:
>>>         hi Chris, it is me "王建新"
>>>
>>>         I have a new problem, Could you give me any advice? Thank you.
>>>
>>>
>>>         I want to use lucene with some additional info,like:
>>>
>>>         1.index
>>>             Document additionalDoc=ew Document()
>>>
>>>             additionalDoc.add(new Field("field","AA BB","Addtional info
>>> ..............."));
>>>             additionalDoc.add(new Field("field","BB CC","Addtional info
>>> 222222222222222222222222..............."));
>>>
>>>             writer.addDocument(additionalDoc)
>>>
>>>             ........
>>>
>>>
>>>         2. search
>>>
>>>             Searcher searcher;
>>>             ....
>>>
>>>             searcher.search(termQuery("field","BB"));
>>>
>>>
>>>
>>>
>>>             in this condition, I want lucene returns the additionalDoc ,
>>> also know which fileds were matched, then I will get the additional info
>>> from the matched fields.
>>>
>>>         Can lucene make it in version 2.3.1?
>>>
>>>
>>>
>>>       --
>>>       Chris Lin
>>>       [hidden email]
>>>       Taipei , Taiwan.
>>>       -----------------------------------------------------------
>>>
>>>
>>>
>>>   --
>>>   Chris Lin
>>>   [hidden email]
>>>   Taipei , Taiwan.
>>>   -----------------------------------------------------------
>>>
>>>  ---------------------------------------------------------------------
>>>  To unsubscribe, e-mail: [hidden email]
>>>  For additional commands, e-mail: [hidden email]
>>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)

王建新

好的,谢谢!

----- Original Message -----
From: "kai.hu" <[hidden email]>
To: <[hidden email]>
Sent: Sunday, May 04, 2008 4:27 PM
Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)


> 在google里搜一下中文分词,出车东的包外,应该还有很多了,如果你发现有更好分词,更高效率的,也推荐一份啊。
>
> --------------------------------------------------
> From: "kai.hu" <[hidden email]>
> Sent: Sunday, May 04, 2008 4:20 PM
> To: <[hidden email]>
> Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>
>
>> 你只要索引并分词“下午去开会”就行了,把对应的时间保存进去。
>> 如document.add(new Field("sub","下午去开会",Field.Store.YES,Field.Index.TOKENIZED));
>> document.add(new
>> Field("time","01:02:02",Field.Store.YES,Field.Index.UN_TOKENIZED));
>> 到时候搜索出的单个document里就包含这两个Field了。
>>
>> only index and tokenized "下午去开会",and store the time with this sub.
>>
>> --------------------------------------------------
>> From: "Cedric Ho" <[hidden email]>
>> Sent: Tuesday, April 22, 2008 3:36 PM
>> To: <[hidden email]>
>> Subject: Re: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>
>>
>>> In that case you may want to index each:
>>>
>>> Field("Sub","下午去开会","01:02:02");
>>>
>>> as a separate document. So your document contains 3 fields
>>> 1. title
>>> 2. time
>>> 3. sub
>>>
>>> then you can get both title and time by searching the "sub" field.
>>>
>>> Cedric
>>>
>>>
>>> 2008/4/22 王建新 <[hidden email]>:
>>>>
>>>>  谢谢,我只是检索sub,不检索时间,在检索sub时,只想得到匹配Field对应的时间。
>>>>  用payload似乎不可以?
>>>>
>>>>
>>>>
>>>>  ----- Original Message -----
>>>>  From: <[hidden email]>
>>>>  To: <[hidden email]>
>>>>  Sent: Tuesday, April 22, 2008 1:55 PM
>>>>  Subject: RE: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>>>
>>>>
>>>>  Try to use payload which is stored as additional information. Currently
>>>> lucene only support per token payload, but you can add an arbitrary
>>>> token for the time information.
>>>>
>>>>  I am not sure what are the query information? Only the subtitle or both
>>>> subtitle and time?
>>>>
>>>>  Regards,
>>>>
>>>>  -----Original Message-----
>>>>  From: 王建新 [mailto:[hidden email]]
>>>>  Sent: Tuesday, April 22, 2008 1:06 PM
>>>>  To: java-user
>>>>  Subject: Need addtional info for Field(希望看得懂中文的朋友帮我出出主意)
>>>>
>>>>  用英文可能描述得不是很清楚,不好意思:)
>>>>
>>>>
>>>>  ----- Original Message -----
>>>>  From: 王建新
>>>>  To: Chris
>>>>  Sent: Tuesday, April 22, 2008 9:52 AM
>>>>  Subject: Re: Need addtional info for Field
>>>>
>>>>
>>>>  谢谢。
>>>>  我的问题是这样的:要对一批视频文件(video)建立索引(index),在建立索引之前,我已经分析出了在视频的什么时间出现了什么样的字幕内容。
>>>>  在这种情况下,一个视频节目就相当于一个Document,那么需要(希望)对字幕建立索引,如下:
>>>>     Field("Sub","下午去开会","01:02:02");
>>>>     Field("Sub","后天去开会","01:03:05");
>>>>     [注:"01:02:02"是附属的时间,lucene没有提供这种用法。]
>>>>
>>>>  这两个Field表示在当前的视频节目中,在01:02:02时间出现了字幕"下午去开会",在01:03:05时间出现了"后天去开会",如果用户(User)搜索"下午",当前视频节目是可以匹配的,但是只匹配到了第一个Field,只需要知道时间"01:02:02"。如果用户搜索"开会",则两个Field都可以匹配到。因此需要知道时间"01:02:02"和"01:03:05"。
>>>>  不知道我有没有说清楚。
>>>>
>>>>  我想知道lucene是不是可以通过某种方式解决这个问题,如果不行的话,需要怎样修改lucene呢?
>>>>
>>>>  王建新
>>>>   ----- Original Message -----
>>>>   From: Chris
>>>>   To: 王建新
>>>>   Sent: Monday, April 21, 2008 7:34 PM
>>>>   Subject: Re: Need addtional info for Field
>>>>
>>>>
>>>>   您的功能可以再清楚一點嗎,因為其實這樣處理,好像要斷詞....
>>>>
>>>>   但看到您沒斷,而且欄位名稱一樣是 multi-pair 值的話,不是用 String 存哦
>>>>
>>>>   以上
>>>>                      Chris.
>>>>
>>>>
>>>>   2008/4/21, 王建新 <[hidden email]>:
>>>>     你看得懂中文吗?
>>>>
>>>>     我不是很明白你的意思。
>>>>     你是说可以用lucene现有的功能来解决这个问题吗?
>>>>
>>>>       ----- Original Message -----
>>>>       From: Chris
>>>>       To: 王建新
>>>>       Sent: Monday, April 21, 2008 5:14 PM
>>>>       Subject: Re: Need addtional info for Field
>>>>
>>>>
>>>>       This problem is not solve with lucene but or method will solve it.
>>>>
>>>>       The structure is not define as this as well ......
>>>>
>>>>       You may check it clear....
>>>>
>>>>       above
>>>>                      Chris.
>>>>
>>>>
>>>>       2008/4/21, 王建新 <[hidden email]>:
>>>>         hi Chris, it is me "王建新"
>>>>
>>>>         I have a new problem, Could you give me any advice? Thank you.
>>>>
>>>>
>>>>         I want to use lucene with some additional info,like:
>>>>
>>>>         1.index
>>>>             Document additionalDoc=ew Document()
>>>>
>>>>             additionalDoc.add(new Field("field","AA BB","Addtional info
>>>> ..............."));
>>>>             additionalDoc.add(new Field("field","BB CC","Addtional info
>>>> 222222222222222222222222..............."));
>>>>
>>>>             writer.addDocument(additionalDoc)
>>>>
>>>>             ........
>>>>
>>>>
>>>>         2. search
>>>>
>>>>             Searcher searcher;
>>>>             ....
>>>>
>>>>             searcher.search(termQuery("field","BB"));
>>>>
>>>>
>>>>
>>>>
>>>>             in this condition, I want lucene returns the additionalDoc ,
>>>> also know which fileds were matched, then I will get the additional info
>>>> from the matched fields.
>>>>
>>>>         Can lucene make it in version 2.3.1?
>>>>
>>>>
>>>>
>>>>       --
>>>>       Chris Lin
>>>>       [hidden email]
>>>>       Taipei , Taiwan.
>>>>       -----------------------------------------------------------
>>>>
>>>>
>>>>
>>>>   --
>>>>   Chris Lin
>>>>   [hidden email]
>>>>   Taipei , Taiwan.
>>>>   -----------------------------------------------------------
>>>>
>>>>  ---------------------------------------------------------------------
>>>>  To unsubscribe, e-mail: [hidden email]
>>>>  For additional commands, e-mail: [hidden email]
>>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>