[jira] Created: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
-----------------------------------------------------------------------------------------------------

                 Key: LUCENE-1209
                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/benchmark
    Affects Versions: 2.4
            Reporter: Mark Miller
            Priority: Trivial
         Attachments: reset_config.patch

I want to be able to run one benchmark that tests things using term vectors and not using term vectors.

Currently this is not easy because you cannot specify term vectors per round.

While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.

If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.

- Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Miller updated LUCENE-1209:
--------------------------------

    Attachment: reset_config.patch

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576615#action_12576615 ]

doronc edited comment on LUCENE-1209 at 3/8/08 1:17 PM:
-------------------------------------------------------------

Config maintains properties by round, so this should do the trick:

{code}
doc.term.vector=tvf:true:false
{code}

It sets term-vectors to true in round 0, false in round 1, true in round 2, etc.
Also, a column is added to the reports with the value of this property ('tvf').

Unless you already tried this and it didn't work?


      was (Author: doronc):
    Config maintains properties by round, so this should do the trick:

{code}
doc.term.vector=tvf:true:false:true
{code}

It sets term-vectors to true in round 0, false in round 1, true in round 2, etc.
Also, a column is added to the reports with the value of this property ('tvf').

Unless you already tried this and it didn't work?

 

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576615#action_12576615 ]

Doron Cohen commented on LUCENE-1209:
-------------------------------------

Config maintains properties by round, so this should do the trick:

{code}
doc.term.vector=tvf:true:false:true
{code}

It sets term-vectors to true in round 0, false in round 1, true in round 2, etc.
Also, a column is added to the reports with the value of this property ('tvf').

Unless you already tried this and it didn't work?


> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576644#action_12576644 ]

Mark Miller commented on LUCENE-1209:
-------------------------------------

It seems to me that its not working right. Everything that is set in public void setConfig(Config config) is only set once for me, not per round. That is unless I apply the above patch. This means that I cannot seem to set tokenizing, storing, or termvectors per round.

From what I can tell it is because setConfig is only called once, and so only the first value is every read for those properties. The patch above puts set config in the resetInputs method which does get called per round. Not sure if that is the best fix, but I know cannot currently set those per round and have anything but the first setting take effect.

- Mark

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576723#action_12576723 ]

Doron Cohen commented on LUCENE-1209:
-------------------------------------

Mark you are right that setConfig is called just once, at start.
At least for setting properties by round this should be sufficient.
I wonder why this doesn't work for you.

I tried with this one:

{code}
compound=true

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=RamDirectory

doc.stored=true
doc.tokenized=true
doc.term.vector=termVec:false:true
doc.add.log.step=10

doc.maker=org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
task.max.depth.log=1

{

    { "Populate"
        CreateIndex
        { AddDoc > : 50
        Optimize
        CloseIndex
    >

    ResetSystemErase
    NewRound

} : 2

RepSumByName
RepSelectByPref Populate
{code}

And got this output:
{code}
 Working Directory: work
 Running algorithm from: conf\termVecByRound.alg
 ------------> config properties:
 analyzer = org.apache.lucene.analysis.standard.StandardAnalyzer
 compound = true
 directory = RamDirectory
 doc.add.log.step = 10
 doc.maker = org.apache.lucene.benchmark.byTask.feeds.SimpleDocMaker
 doc.stored = true
 doc.term.vector = termVec:false:true
 doc.tokenized = true
 task.max.depth.log = 1
 work.dir = work
 -------------------------------
 ------------> algorithm:
 Seq {
     Seq_2 {
         Populate {
             CreateIndex
             Seq_50 {
                 AddDoc
             > * 50
             Optimize
             CloseIndex
         >
         ResetSystemErase
         NewRound
     } * 2
     RepSumByName
     RepSelectByPref Populate
 }
 
 ------------> starting task: Seq
 ------------> starting task: Seq_2
 --> 0.1 sec: main processed (add) 10 docs
 --> 0.1 sec: main processed (add) 20 docs
 --> 0.11 sec: main processed (add) 30 docs
 --> 0.11 sec: main processed (add) 40 docs
 --> 0.11 sec: main processed (add) 50 docs
 ------------> SimpleDocMaker statistics (0):
 num docs added since last inputs reset:                   50
 total bytes added since last inputs reset:             42,150
 
 
 
 --> Round 0-->1:   doc.term.vector:false-->true
 
 --> 0 sec: main processed (add) 60 docs
 --> 0 sec: main processed (add) 70 docs
 --> 0 sec: main processed (add) 80 docs
 --> 0 sec: main processed (add) 90 docs
 --> 0 sec: main processed (add) 100 docs
 ------------> SimpleDocMaker statistics (1):
 num docs added since last inputs reset:                   50
 total bytes added since last inputs reset:             42,150
 
 
 
 --> Round 1-->2:   doc.term.vector:true-->false
 
 
 ------------> Report Sum By (any) Name (2 about 3 out of 4)
 Operation   round termVec   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
 Seq_2           0   false        1          106        530.0        0.20       639,912      5,177,344
 Populate        -       -        2           53        706.7        0.15       839,552      5,177,344
 
 
 ------------> Report Select By Prefix (Populate) (2 about 2 out of 4)
 Operation   round termVec   runCnt   recsPerRun        rec/s  elapsedSec    avgUsedMem    avgTotalMem
 Populate        0   false        1           53        378.6        0.14       858,080      5,177,344
 Populate -  -   1 -  true -  -   1 -  -  -   53 -  - 5,300.0 -  -   0.01 -  -  821,024 -  - 5,177,344
 
 ####################
 ###  D O N E !!! ###
 ####################
{code}

Note in particular this line:
{code}
[java] --> Round 0-->1:   doc.term.vector:false-->true
{code}

Note that a *NewRound* command is required in order for the round number to change.
{code}
    NewRound
{code}

A possible cause for error is that the property definition parsing requires a property name prefix for multi-valued properties.
So this would not work as expected:
{code}
doc.term.vector=false:true
{code}

But this will work:
{code}
doc.term.vector=termVec:false:true
{code}

If it still doesn't work for you, can you post here the algorithm?

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576770#action_12576770 ]

[hidden email] edited comment on LUCENE-1209 at 3/9/08 6:44 AM:
-------------------------------------------------------------

My algorithm is below.

I see "Round 0-->1:   doc.term.vector:false-->true" as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for "doc.term.vector" as well as the other guys in setConfig.

More importantly, lets say I set it to true:false....if I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round.

- Mark


{code}
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# -------------------------------------------------------------------------------------

{ "Rounds"
     
    ResetSystemErase

        CreateIndex
        { "MAddDocs" AddDoc(60) } : 20000
        Optimize
        CloseIndex
 
    OpenReader
      { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
    CloseReader
    OpenReader
      { "SearchHlgtSameRdr" SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) > : 1000

    CloseReader

    RepSumByPref SearchHlgtSameRdr

    NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
{code}

      was (Author: [hidden email]):
    My algorithm is below.

I see "Round 0-->1:   doc.term.vector:false-->true" as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for "doc.term.vector" as well as the other guys in setConfig.

More importantly, lets say I set it to true:false....if I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round.

- Mark


<code>
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# -------------------------------------------------------------------------------------

{ "Rounds"
     
    ResetSystemErase

        CreateIndex
        { "MAddDocs" AddDoc(60) } : 20000
        Optimize
        CloseIndex
 
    OpenReader
      { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
    CloseReader
    OpenReader
      { "SearchHlgtSameRdr" SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) > : 1000

    CloseReader

    RepSumByPref SearchHlgtSameRdr

    NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
</code>
 

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576770#action_12576770 ]

Mark Miller commented on LUCENE-1209:
-------------------------------------

My algorithm is below.

I see "Round 0-->1:   doc.term.vector:false-->true" as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for "doc.term.vector" as well as the other guys in setConfig.

More importantly, lets say I set it to true:false....if I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round.

- Mark


<code>
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# -------------------------------------------------------------------------------------

{ "Rounds"
     
    ResetSystemErase

        CreateIndex
        { "MAddDocs" AddDoc(60) } : 20000
        Optimize
        CloseIndex
 
    OpenReader
      { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
    CloseReader
    OpenReader
      { "SearchHlgtSameRdr" SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) > : 1000

    CloseReader

    RepSumByPref SearchHlgtSameRdr

    NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
</code>

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576770#action_12576770 ]

[hidden email] edited comment on LUCENE-1209 at 3/9/08 6:51 AM:
-------------------------------------------------------------

My algorithm is below.

I see "Round 0-->1:   doc.term.vector:false-->true" as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for "doc.term.vector" as well as the other guys in setConfig.

More importantly, lets say I set it to true:false....if I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round.

>>Mark you are right that setConfig is called just once, at start.
>>At least for setting properties by round this should be sufficient.
>>I wonder why this doesn't work for you.

I think this admits the problem right? The get property for everything in setConfig is only called once...that loads up the "false:true", returns false, and sets up "true" to be returned on the next call...the next time you call get on Config you will get the "true"...but there is no next time. Its only done once...so it shows up right in the output "Round 0-->1:   doc.term.vector:false-->true", but its only every called once and so only loads false.

- Mark


{code}
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# -------------------------------------------------------------------------------------

{ "Rounds"
     
    ResetSystemErase

        CreateIndex
        { "MAddDocs" AddDoc(60) } : 20000
        Optimize
        CloseIndex
 
    OpenReader
      { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
    CloseReader
    OpenReader
      { "SearchHlgtSameRdr" SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) > : 1000

    CloseReader

    RepSumByPref SearchHlgtSameRdr

    NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
{code}

      was (Author: [hidden email]):
    My algorithm is below.

I see "Round 0-->1:   doc.term.vector:false-->true" as well...however if I put a debug print on what is returned from public boolean get (String name, boolean dflt), it is only ever called once for "doc.term.vector" as well as the other guys in setConfig.

More importantly, lets say I set it to true:false....if I look at the work/index directory on the second run, there are certainly term vectors. Thats how I noticed this to begin with...I was looking at the index and saw the term vector files on every round. Its possible I have something messed up, but every time I run through everything again and it really does not seem to be working. If I set term vectors to false:true, they are never made in any round.

- Mark


{code}
ram.flush.mb=flush:32:32
compound=false

analyzer=org.apache.lucene.analysis.standard.StandardAnalyzer
directory=FSDirectory

doc.stored=true
doc.tokenized=tok:false:true
doc.term.vector=vec:true:false
doc.term.vector.offsets=tvo:false:true
doc.term.vector.positions=tvp:false:true
doc.add.log.step=2000

docs.dir=reuters-out

doc.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersDocMaker

query.maker=org.apache.lucene.benchmark.byTask.feeds.ReutersQueryMaker

# task at this depth or less would print when they start
task.max.depth.log=2

log.queries=true
# -------------------------------------------------------------------------------------

{ "Rounds"
     
    ResetSystemErase

        CreateIndex
        { "MAddDocs" AddDoc(60) } : 20000
        Optimize
        CloseIndex
 
    OpenReader
      { "SrchTrvRetNewRdr" SearchTravRet(10) > : 1000
    CloseReader
    OpenReader
      { "SearchHlgtSameRdr" SearchTravRetHighlight(size[20],highlight[20],mergeContiguous[true],maxFrags[0],fields[body]) > : 1000

    CloseReader

    RepSumByPref SearchHlgtSameRdr

    NewRound

} : 2

RepSumByNameRound
RepSumByName
RepSumByPrefRound MAddDocs
{code}
 

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen reassigned LUCENE-1209:
-----------------------------------

    Assignee: Doron Cohen

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Assignee: Doron Cohen
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12576781#action_12576781 ]

Doron Cohen commented on LUCENE-1209:
-------------------------------------

Ok I can see it now, you're right.
So all doc maker per rounds settings were ignored - first round settings were used.
I am updating TestPerfTasksLogic.testIndexWriterSettings() to catch this bug.
Thanks for catching this,
Doron

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Assignee: Doron Cohen
>            Priority: Trivial
>         Attachments: reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1209:
--------------------------------

    Attachment: reset_config.patch

same fix + test case that fails without the fix.

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Assignee: Doron Cohen
>            Priority: Trivial
>         Attachments: reset_config.patch, reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen updated LUCENE-1209:
--------------------------------

    Attachment: reset_config.patch

QualityTest fails with previous patch, exposing a related bug in ReutersDocMaker,
of not reseting files list at call to setConfig(), Was not required before, but now since
setConfig is called more than once must clear the list of collected files.
Attached file fixes this and all benchmark tests pass.

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Assignee: Doron Cohen
>            Priority: Trivial
>         Attachments: reset_config.patch, reset_config.patch, reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (LUCENE-1209) If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doron Cohen resolved LUCENE-1209.
---------------------------------

       Resolution: Fixed
    Lucene Fields: [Patch Available]  (was: [Patch Available, New])

Committed, thanks Mark!

> If setConfig(Config config) is called in resetInputs(), you can turn term vectors off and on by round
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1209
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1209
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/benchmark
>    Affects Versions: 2.4
>            Reporter: Mark Miller
>            Assignee: Doron Cohen
>            Priority: Trivial
>         Attachments: reset_config.patch, reset_config.patch, reset_config.patch
>
>
> I want to be able to run one benchmark that tests things using term vectors and not using term vectors.
> Currently this is not easy because you cannot specify term vectors per round.
> While you do have to create a new index per round, this automation is preferable to me in comparison to running two separate tests.
> If it doesn't affect anything else, it would be great to have setConfig(Config config) called in BasicDocMaker.resetInputs(). This would keep the term vector options up to date per round if you reset.
> - Mark

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]