FP Growth Understanding

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

FP Growth Understanding

Grant Ingersoll-2
I ran:

./mahout fpg -i <PATH>/content/freqitemset/accidents.dat -o patterns -k 50 -method mapreduce -g 10 -regex [\ ]

Per http://cwiki.apache.org/confluence/display/MAHOUT/ParallelFrequentPatternMining

And now I see
> ls patterns/
fpgrowth/               frequentPatterns/       parallelcounting/       sortedoutput/

Looking in:  ./mahout seqdump --seqFile patterns/fpgrowth/part-r-00000

I see:
Input Path: patterns/fpgrowth/part-r-00000
Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns
Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490), ([17, 12, 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18, 68],90229), ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062), ([12, 31, 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18, 31, 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610), ([16, 68],87933), ([17, 16, 68],87924), ([12, 16, 68],87847), ([17, 12, 16, 68],87838), ([18, 16, 68],87644), ([17, 18, 16, 68],87635), ([12, 18, 16, 68],87589), ([17, 12, 18, 16, 68],87580), ([16, 31, 68],86362), ([17, 16, 31, 68],86353), ([12, 16, 31, 68],86279), ([17, 12, 16, 31, 68],86270), ([18, 16, 31, 68],86082), ([17, 18, 16, 31, 68],86073), ([12, 18, 16, 31, 68],86027), ([17, 12, 18, 16, 31, 68],86018), ([31, 21, 68],85090), ([17, 31, 21, 68],85081), ([12, 31, 21, 68],84903), ([17, 12, 31, 21, 68],84894), ([17, 12, 18, 31, 21, 68],84653), ([16, 21, 68],83908), ([12, 16, 21, 68],83829), ([18, 16, 21, 68],83639), ([17, 18, 16, 21, 68],83630), ([12, 18, 16, 21, 68],83587), ([17, 12, 18, 16, 21, 68],83578), ([16, 31, 21, 68],82495), ([17, 16, 31, 21, 68],82486), ([12, 16, 31, 21, 68],82418), ([17, 12, 16, 31, 21, 68],82409), ([18, 16, 31, 21, 68],82232), ([17, 18, 16, 31, 21, 68],82223), ([12, 18, 16, 31, 21, 68],82180)
Key: 335: Value: ([335],90909), ([17, 335],90903), ([12, 335],90869), ([17, 12, 335],90863), ([18, 335],90754), ([17, 18, 335],90748), ([12, 18, 335],90718), ([17, 12, 18, 335],90712), ([16, 335],89080), ([17, 16, 335],89074), ([12, 16, 335],89049), ([17, 12, 16, 335],89043), ([18, 16, 335],88932), ([17, 18, 16, 335],88926), ([12, 18, 16, 335],88901), ([17, 12, 18, 16, 335],88895), ([31, 335],84776), ([17, 31, 335],84771), ([12, 31, 335],84744), ([17, 12, 31, 335],84739), ([18, 31, 335],84647), ([17, 18, 31, 335],84642), ([12, 18, 31, 335],84618), ([17, 12, 18, 31, 335],84613), ([16, 31, 335],83373), ([17, 16, 31, 335],83368), ([12, 16, 31, 335],83348), ([17, 12, 16, 31, 335],83343), ([18, 16, 31, 335],83249), ([17, 18, 16, 31, 335],83244), ([12, 18, 16, 31, 335],83224), ([17, 12, 18, 16, 31, 335],83219), ([17, 18, 16, 21, 335],78117), ([12, 18, 16, 21, 335],78093), ([17, 12, 18, 16, 21, 335],78087), ([31, 21, 335],74945), ([17, 31, 21, 335],74940), ([12, 31, 21, 335],74915), ([17, 12, 31, 21, 335],74910), ([18, 31, 21, 335],74828), ([17, 18, 31, 21, 335],74823), ([12, 18, 31, 21, 335],74800), ([17, 12, 18, 31, 21, 335],74795), ([16, 31, 21, 335],73641), ([17, 16, 31, 21, 335],73636), ([12, 16, 31, 21, 335],73617), ([17, 12, 16, 31, 21, 335],73612), ([18, 16, 31, 21, 335],73528), ([17, 18, 16, 31, 21, 335],73523), ([12, 18, 16, 31, 21, 335],73504)
Key: 64: Value: ([64],95673), ([17, 64],95662), ([12, 64],95501), ([17, 12, 64],95490), ([18, 64],95407), ([17, 18, 64],95396), ([12, 18, 64],95352), ([17, 12, 18, 64],95341), ([16, 64],94511), ([17, 16, 64],94500), ([12, 16, 64],94439), ([17, 12, 16, 64],94428), ([18, 16, 64],94343), ([17, 18, 16, 64],94332), ([12, 18, 16, 64],94290), ([17, 12, 18, 16, 64],94279), ([31, 64],91275), ([17, 31, 64],91265), ([12, 31, 64],91124), ([17, 12, 31, 64],91114), ([18, 31, 64],91030), ([17, 18, 31, 64],91020), ([12, 18, 31, 64],90987), ([17, 12, 18, 31, 64],90977), ([16, 31, 64],90304), ([17, 16, 31, 64],90294), ([12, 16, 31, 64],90246), ([17, 12, 16, 31, 64],90236), ([18, 16, 31, 64],90150), ([17, 18, 16, 31, 64],90140), ([12, 18, 16, 31, 64],90109), ([17, 12, 18, 16, 31, 64],90099), ([17, 18, 16, 21, 64],82484), ([12, 18, 16, 21, 64],82445), ([17, 12, 18, 16, 21, 64],82435), ([31, 21, 64],80204), ([17, 31, 21, 64],80195), ([12, 31, 21, 64],80072), ([17, 12, 31, 21, 64],80063), ([18, 31, 21, 64],79989), ([17, 18, 31, 21, 64],79980), ([12, 18, 31, 21, 64],79949), ([17, 12, 18, 31, 21, 64],79940), ([16, 31, 21, 64],79344), ([17, 16, 31, 21, 64],79335), ([12, 16, 31, 21, 64],79291), ([17, 12, 16, 31, 21, 64],79282), ([18, 16, 31, 21, 64],79206), ([17, 18, 16, 31, 21, 64],79197), ([12, 18, 16, 31, 21, 64],79168)
Key: 5: Value: ([5],96818), ([17, 5],96815), ([12, 5],96711), ([17, 12, 5],96708), ([18, 5],96613), ([17, 18, 5],96610), ([12, 18, 5],96582), ([17, 12, 18, 5],96579), ([16, 5],95797), ([17, 16, 5],95794), ([12, 16, 5],95752), ([17, 12, 16, 5],95749), ([18, 16, 5],95655), ([17, 18, 16, 5],95652), ([12, 18, 16, 5],95625), ([17, 12, 18, 16, 5],95622), ([31, 5],94517), ([17, 31, 5],94514), ([12, 31, 5],94415), ([17, 12, 31, 5],94412), ([18, 31, 5],94320), ([17, 18, 31, 5],94317), ([12, 18, 31, 5],94292), ([17, 12, 18, 31, 5],94289), ([16, 31, 5],93587), ([17, 16, 31, 5],93584), ([12, 16, 31, 5],93544), ([17, 12, 16, 31, 5],93541), ([18, 16, 31, 5],93451), ([17, 18, 16, 31, 5],93448), ([12, 18, 16, 31, 5],93423), ([17, 12, 18, 16, 31, 5],93420), ([17, 18, 16, 21, 5],90130), ([12, 18, 16, 21, 5],90104), ([17, 12, 18, 16, 21, 5],90101), ([31, 21, 5],89273), ([17, 31, 21, 5],89270), ([12, 31, 21, 5],89179), ([17, 12, 31, 21, 5],89176), ([18, 31, 21, 5],89089), ([17, 18, 31, 21, 5],89086), ([12, 18, 31, 21, 5],89062), ([17, 12, 18, 31, 21, 5],89059), ([16, 31, 21, 5],88402), ([17, 16, 31, 21, 5],88399), ([12, 16, 31, 21, 5],88360), ([17, 12, 16, 31, 21, 5],88357), ([18, 16, 31, 21, 5],88272), ([17, 18, 16, 31, 21, 5],88269), ([12, 18, 16, 31, 21, 5],88245)

What's the interpretation or this output?  Is this the right place to look?  What about the other directories?

-Grant
Reply | Threaded
Open this post in threaded view
|

Re: FP Growth Understanding

Robin Anil
Each key is a feature and each attribute is the topK frequent patterns where
the feature exist

From here one can use this information to show pattern recommendation(query
recommendation as in the original pfpgrowth paper)

or one can write a m/r job to count the support and confidence and create
association rules(yet to be done)

that will be like

f1, f2, f3, f4 => f5 (support, confidence)

http://publib.boulder.ibm.com/infocenter/db2luw/v8/index.jsp?topic=/com.ibm.im.model.doc/c_confidence_in_an_association_rule.html

Robin

On Mon, Feb 15, 2010 at 4:53 AM, Grant Ingersoll <[hidden email]>wrote:

> I ran:
>
> ./mahout fpg -i <PATH>/content/freqitemset/accidents.dat -o patterns -k 50
> -method mapreduce -g 10 -regex [\ ]
>
> Per
> http://cwiki.apache.org/confluence/display/MAHOUT/ParallelFrequentPatternMining
>
> And now I see
> > ls patterns/
> fpgrowth/               frequentPatterns/       parallelcounting/
> sortedoutput/
>
> Looking in:  ./mahout seqdump --seqFile patterns/fpgrowth/part-r-00000
>
> I see:
> Input Path: patterns/fpgrowth/part-r-00000
> Key class: class org.apache.hadoop.io.Text Value Class: class
> org.apache.mahout.fpm.pfpgrowth.convertors.string.TopKStringPatterns
> Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490), ([17, 12,
> 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18, 68],90229),
> ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062), ([12, 31,
> 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18, 31,
> 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610), ([16,
> 68],87933), ([17, 16, 68],87924), ([12, 16, 68],87847), ([17, 12, 16,
> 68],87838), ([18, 16, 68],87644), ([17, 18, 16, 68],87635), ([12, 18, 16,
> 68],87589), ([17, 12, 18, 16, 68],87580), ([16, 31, 68],86362), ([17, 16,
> 31, 68],86353), ([12, 16, 31, 68],86279), ([17, 12, 16, 31, 68],86270),
> ([18, 16, 31, 68],86082), ([17, 18, 16, 31, 68],86073), ([12, 18, 16, 31,
> 68],86027), ([17, 12, 18, 16, 31, 68],86018), ([31, 21, 68],85090), ([17,
> 31, 21, 68],85081), ([12, 31, 21, 68],84903), ([17, 12, 31, 21, 68],84894),
> ([17, 12, 18, 31, 21, 68],84653), ([16, 21, 68],83908), ([12, 16, 21,
> 68],83829), ([18, 16, 21, 68],83639), ([17, 18, 16, 21, 68],83630), ([12,
> 18, 16, 21, 68],83587), ([17, 12, 18, 16, 21, 68],83578), ([16, 31, 21,
> 68],82495), ([17, 16, 31, 21, 68],82486), ([12, 16, 31, 21, 68],82418),
> ([17, 12, 16, 31, 21, 68],82409), ([18, 16, 31, 21, 68],82232), ([17, 18,
> 16, 31, 21, 68],82223), ([12, 18, 16, 31, 21, 68],82180)
> Key: 335: Value: ([335],90909), ([17, 335],90903), ([12, 335],90869), ([17,
> 12, 335],90863), ([18, 335],90754), ([17, 18, 335],90748), ([12, 18,
> 335],90718), ([17, 12, 18, 335],90712), ([16, 335],89080), ([17, 16,
> 335],89074), ([12, 16, 335],89049), ([17, 12, 16, 335],89043), ([18, 16,
> 335],88932), ([17, 18, 16, 335],88926), ([12, 18, 16, 335],88901), ([17, 12,
> 18, 16, 335],88895), ([31, 335],84776), ([17, 31, 335],84771), ([12, 31,
> 335],84744), ([17, 12, 31, 335],84739), ([18, 31, 335],84647), ([17, 18, 31,
> 335],84642), ([12, 18, 31, 335],84618), ([17, 12, 18, 31, 335],84613), ([16,
> 31, 335],83373), ([17, 16, 31, 335],83368), ([12, 16, 31, 335],83348), ([17,
> 12, 16, 31, 335],83343), ([18, 16, 31, 335],83249), ([17, 18, 16, 31,
> 335],83244), ([12, 18, 16, 31, 335],83224), ([17, 12, 18, 16, 31,
> 335],83219), ([17, 18, 16, 21, 335],78117), ([12, 18, 16, 21, 335],78093),
> ([17, 12, 18, 16, 21, 335],78087), ([31, 21, 335],74945), ([17, 31, 21,
> 335],74940), ([12, 31, 21, 335],74915), ([17, 12, 31, 21, 335],74910), ([18,
> 31, 21, 335],74828), ([17, 18, 31, 21, 335],74823), ([12, 18, 31, 21,
> 335],74800), ([17, 12, 18, 31, 21, 335],74795), ([16, 31, 21, 335],73641),
> ([17, 16, 31, 21, 335],73636), ([12, 16, 31, 21, 335],73617), ([17, 12, 16,
> 31, 21, 335],73612), ([18, 16, 31, 21, 335],73528), ([17, 18, 16, 31, 21,
> 335],73523), ([12, 18, 16, 31, 21, 335],73504)
> Key: 64: Value: ([64],95673), ([17, 64],95662), ([12, 64],95501), ([17, 12,
> 64],95490), ([18, 64],95407), ([17, 18, 64],95396), ([12, 18, 64],95352),
> ([17, 12, 18, 64],95341), ([16, 64],94511), ([17, 16, 64],94500), ([12, 16,
> 64],94439), ([17, 12, 16, 64],94428), ([18, 16, 64],94343), ([17, 18, 16,
> 64],94332), ([12, 18, 16, 64],94290), ([17, 12, 18, 16, 64],94279), ([31,
> 64],91275), ([17, 31, 64],91265), ([12, 31, 64],91124), ([17, 12, 31,
> 64],91114), ([18, 31, 64],91030), ([17, 18, 31, 64],91020), ([12, 18, 31,
> 64],90987), ([17, 12, 18, 31, 64],90977), ([16, 31, 64],90304), ([17, 16,
> 31, 64],90294), ([12, 16, 31, 64],90246), ([17, 12, 16, 31, 64],90236),
> ([18, 16, 31, 64],90150), ([17, 18, 16, 31, 64],90140), ([12, 18, 16, 31,
> 64],90109), ([17, 12, 18, 16, 31, 64],90099), ([17, 18, 16, 21, 64],82484),
> ([12, 18, 16, 21, 64],82445), ([17, 12, 18, 16, 21, 64],82435), ([31, 21,
> 64],80204), ([17, 31, 21, 64],80195), ([12, 31, 21, 64],80072), ([17, 12,
> 31, 21, 64],80063), ([18, 31, 21, 64],79989), ([17, 18, 31, 21, 64],79980),
> ([12, 18, 31, 21, 64],79949), ([17, 12, 18, 31, 21, 64],79940), ([16, 31,
> 21, 64],79344), ([17, 16, 31, 21, 64],79335), ([12, 16, 31, 21, 64],79291),
> ([17, 12, 16, 31, 21, 64],79282), ([18, 16, 31, 21, 64],79206), ([17, 18,
> 16, 31, 21, 64],79197), ([12, 18, 16, 31, 21, 64],79168)
> Key: 5: Value: ([5],96818), ([17, 5],96815), ([12, 5],96711), ([17, 12,
> 5],96708), ([18, 5],96613), ([17, 18, 5],96610), ([12, 18, 5],96582), ([17,
> 12, 18, 5],96579), ([16, 5],95797), ([17, 16, 5],95794), ([12, 16,
> 5],95752), ([17, 12, 16, 5],95749), ([18, 16, 5],95655), ([17, 18, 16,
> 5],95652), ([12, 18, 16, 5],95625), ([17, 12, 18, 16, 5],95622), ([31,
> 5],94517), ([17, 31, 5],94514), ([12, 31, 5],94415), ([17, 12, 31,
> 5],94412), ([18, 31, 5],94320), ([17, 18, 31, 5],94317), ([12, 18, 31,
> 5],94292), ([17, 12, 18, 31, 5],94289), ([16, 31, 5],93587), ([17, 16, 31,
> 5],93584), ([12, 16, 31, 5],93544), ([17, 12, 16, 31, 5],93541), ([18, 16,
> 31, 5],93451), ([17, 18, 16, 31, 5],93448), ([12, 18, 16, 31, 5],93423),
> ([17, 12, 18, 16, 31, 5],93420), ([17, 18, 16, 21, 5],90130), ([12, 18, 16,
> 21, 5],90104), ([17, 12, 18, 16, 21, 5],90101), ([31, 21, 5],89273), ([17,
> 31, 21, 5],89270), ([12, 31, 21, 5],89179), ([17, 12, 31, 21, 5],89176),
> ([18, 31, 21, 5],89089), ([17, 18, 31, 21, 5],89086), ([12, 18, 31, 21,
> 5],89062), ([17, 12, 18, 31, 21, 5],89059), ([16, 31, 21, 5],88402), ([17,
> 16, 31, 21, 5],88399), ([12, 16, 31, 21, 5],88360), ([17, 12, 16, 31, 21,
> 5],88357), ([18, 16, 31, 21, 5],88272), ([17, 18, 16, 31, 21, 5],88269),
> ([12, 18, 16, 31, 21, 5],88245)
>
> What's the interpretation or this output?  Is this the right place to look?
>  What about the other directories?
>
> -Grant
Reply | Threaded
Open this post in threaded view
|

Re: FP Growth Understanding

Grant Ingersoll-2

On Feb 14, 2010, at 11:37 PM, Robin Anil wrote:

> Each key is a feature and each attribute is the topK frequent patterns where
> the feature exist

Still a bit confused.
Given:
Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490), ([17, 12, 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18, 68],90229), ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062), ([12, 31, 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18, 31, 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610), ([16, 68],87933),

So, 68 is the feature in question.  That makes sense.  Then, what is the significance of the [] areas, as in [68],90692 or [17,12,68], 90481.  Why all the repetition?

-Grant
Reply | Threaded
Open this post in threaded view
|

Re: FP Growth Understanding

Robin Anil
Ok.. A bit more background..

An Itemset is a subset I1, I2, I3... In

so [I2, I4, I7] is an itemset and the support(no of times its visible in the
dataset) is say Y

A Pattern is Pair<Itemset, support>

Take a look at in this format

68:
     ([68],90692),
     ([17, 68],90683),
     ([12, 68],90490),
     ([17, 12, 68],90481),
     ([18, 68],90291)

these are top patterns containing 68 and their support in descending order
68 occurs with 12,  90490 times

Robin


On Mon, Feb 15, 2010 at 6:27 PM, Grant Ingersoll <[hidden email]>wrote:

>
> On Feb 14, 2010, at 11:37 PM, Robin Anil wrote:
>
> > Each key is a feature and each attribute is the topK frequent patterns
> where
> > the feature exist
>
> Still a bit confused.
> Given:
> Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490), ([17, 12,
> 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18, 68],90229),
> ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062), ([12, 31,
> 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18, 31,
> 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610), ([16,
> 68],87933),
>
> So, 68 is the feature in question.  That makes sense.  Then, what is the
> significance of the [] areas, as in [68],90692 or [17,12,68], 90481.  Why
> all the repetition?
>
> -Grant
Reply | Threaded
Open this post in threaded view
|

Re: FP Growth Understanding

Neal Richter-3
Grant:  Chapter 5 of Han and Kamber (Data Mining: Concepts and
Techniques) detail itemset mining and the fpgrowth alg.  Han is a
co-inventor of it.

There is a bit of repetition in the output compared to other itemset
mining packages, though this structure is convenient for relational
indexing by key.

- Neal

On Mon, Feb 15, 2010 at 6:49 AM, Robin Anil <[hidden email]> wrote:

> Ok.. A bit more background..
>
> An Itemset is a subset I1, I2, I3... In
>
> so [I2, I4, I7] is an itemset and the support(no of times its visible in the
> dataset) is say Y
>
> A Pattern is Pair<Itemset, support>
>
> Take a look at in this format
>
> 68:
>     ([68],90692),
>     ([17, 68],90683),
>     ([12, 68],90490),
>     ([17, 12, 68],90481),
>     ([18, 68],90291)
>
> these are top patterns containing 68 and their support in descending order
> 68 occurs with 12,  90490 times
>
> Robin
>
>
> On Mon, Feb 15, 2010 at 6:27 PM, Grant Ingersoll <[hidden email]>wrote:
>
>>
>> On Feb 14, 2010, at 11:37 PM, Robin Anil wrote:
>>
>> > Each key is a feature and each attribute is the topK frequent patterns
>> where
>> > the feature exist
>>
>> Still a bit confused.
>> Given:
>> Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490), ([17, 12,
>> 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18, 68],90229),
>> ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062), ([12, 31,
>> 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18, 31,
>> 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610), ([16,
>> 68],87933),
>>
>> So, 68 is the feature in question.  That makes sense.  Then, what is the
>> significance of the [] areas, as in [68],90692 or [17,12,68], 90481.  Why
>> all the repetition?
>>
>> -Grant
>
Reply | Threaded
Open this post in threaded view
|

Re: FP Growth Understanding

Robin Anil
Hi Neal,
             I know there is repetition. I tried sticking true to the
original algorithm that is finding closed patterns and using the longest
one.

Say if 68 and 12 occurs 1000 times
and 68 12 17 also occurs 1000 times, there so information that former
pattern gives you. So, you can remove it. Therefore you say that 68 12 17 is
a closed pattern and all the patterns it is enclosing are removed.

had 68 alone occurred 2000 times. It no longer becomes a closed pattern..

Things could be made configurable by having a flag to remove closed patterns
within a percentage of the support Or mine only patterns > 3 items in
length. These are tricky but could be done.

Robin


On Mon, Feb 15, 2010 at 9:34 PM, Neal Richter <[hidden email]> wrote:

> Grant:  Chapter 5 of Han and Kamber (Data Mining: Concepts and
> Techniques) detail itemset mining and the fpgrowth alg.  Han is a
> co-inventor of it.
>
> There is a bit of repetition in the output compared to other itemset
> mining packages, though this structure is convenient for relational
> indexing by key.
>
> - Neal
>
> On Mon, Feb 15, 2010 at 6:49 AM, Robin Anil <[hidden email]> wrote:
> > Ok.. A bit more background..
> >
> > An Itemset is a subset I1, I2, I3... In
> >
> > so [I2, I4, I7] is an itemset and the support(no of times its visible in
> the
> > dataset) is say Y
> >
> > A Pattern is Pair<Itemset, support>
> >
> > Take a look at in this format
> >
> > 68:
> >     ([68],90692),
> >     ([17, 68],90683),
> >     ([12, 68],90490),
> >     ([17, 12, 68],90481),
> >     ([18, 68],90291)
> >
> > these are top patterns containing 68 and their support in descending
> order
> > 68 occurs with 12,  90490 times
> >
> > Robin
> >
> >
> > On Mon, Feb 15, 2010 at 6:27 PM, Grant Ingersoll <[hidden email]
> >wrote:
> >
> >>
> >> On Feb 14, 2010, at 11:37 PM, Robin Anil wrote:
> >>
> >> > Each key is a feature and each attribute is the topK frequent patterns
> >> where
> >> > the feature exist
> >>
> >> Still a bit confused.
> >> Given:
> >> Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490), ([17,
> 12,
> >> 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18,
> 68],90229),
> >> ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062), ([12,
> 31,
> >> 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18,
> 31,
> >> 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610),
> ([16,
> >> 68],87933),
> >>
> >> So, 68 is the feature in question.  That makes sense.  Then, what is the
> >> significance of the [] areas, as in [68],90692 or [17,12,68], 90481.
>  Why
> >> all the repetition?
> >>
> >> -Grant
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: FP Growth Understanding

Neal Richter-3
I have no problem with the repetition!

I'll have to poke at this a bit more, but I like the switches ideas.
I often use Christian Borgelt's itemset implementations for playing
with data.  He's implemented a nice set of switches, see below.
Setting a minimum support threshold and mimimum itemset size are both
convenient and tend to make the algorithm run a bit faster.

http://www.borgelt.net/software.html

nealr@nrichter-laptop:~$ fpgrowth_fim
usage: fpgrowth_fim [options] infile outfile
find frequent item sets with the fpgrowth algorithm
version 1.13 (2008.05.02)        (c) 2004-2008   Christian Borgelt
-m#      minimal number of items per item set (default: 1)
-n#      maximal number of items per item set (default: no limit)
-s#      minimal support of an item set (default: 10%)
         (positive: percentage, negative: absolute number)
-d#      minimal binary logarithm of support quotient (default: none)
-p#      output format for the item set support (default: "%.1f")
-a       print absolute support (number of transactions)
-g       write output in scanable form (quote certain characters)
-q#      sort items w.r.t. their frequency (default: -2)
         (1: ascending, -1: descending, 0: do not sort,
          2: ascending, -2: descending w.r.t. transaction size sum)
-u       use alternative tree projection method
-z       do not prune tree projections to bonsai
-j       use quicksort to sort the transactions (default: heapsort)
-i#      ignore records starting with a character in the given string
-b/f/r#  blank characters, field and record separators
         (default: " \t\r", " \t", "\n")
infile   file to read transactions from
outfile  file to write frequent item se

On Mon, Feb 15, 2010 at 9:14 AM, Robin Anil <[hidden email]> wrote:

> Hi Neal,
>             I know there is repetition. I tried sticking true to the
> original algorithm that is finding closed patterns and using the longest
> one.
>
> Say if 68 and 12 occurs 1000 times
> and 68 12 17 also occurs 1000 times, there so information that former
> pattern gives you. So, you can remove it. Therefore you say that 68 12 17 is
> a closed pattern and all the patterns it is enclosing are removed.
>
> had 68 alone occurred 2000 times. It no longer becomes a closed pattern..
>
> Things could be made configurable by having a flag to remove closed patterns
> within a percentage of the support Or mine only patterns > 3 items in
> length. These are tricky but could be done.
>
> Robin
>
>
> On Mon, Feb 15, 2010 at 9:34 PM, Neal Richter <[hidden email]> wrote:
>
>> Grant:  Chapter 5 of Han and Kamber (Data Mining: Concepts and
>> Techniques) detail itemset mining and the fpgrowth alg.  Han is a
>> co-inventor of it.
>>
>> There is a bit of repetition in the output compared to other itemset
>> mining packages, though this structure is convenient for relational
>> indexing by key.
>>
>> - Neal
>>
>> On Mon, Feb 15, 2010 at 6:49 AM, Robin Anil <[hidden email]> wrote:
>> > Ok.. A bit more background..
>> >
>> > An Itemset is a subset I1, I2, I3... In
>> >
>> > so [I2, I4, I7] is an itemset and the support(no of times its visible in
>> the
>> > dataset) is say Y
>> >
>> > A Pattern is Pair<Itemset, support>
>> >
>> > Take a look at in this format
>> >
>> > 68:
>> >     ([68],90692),
>> >     ([17, 68],90683),
>> >     ([12, 68],90490),
>> >     ([17, 12, 68],90481),
>> >     ([18, 68],90291)
>> >
>> > these are top patterns containing 68 and their support in descending
>> order
>> > 68 occurs with 12,  90490 times
>> >
>> > Robin
>> >
>> >
>> > On Mon, Feb 15, 2010 at 6:27 PM, Grant Ingersoll <[hidden email]
>> >wrote:
>> >
>> >>
>> >> On Feb 14, 2010, at 11:37 PM, Robin Anil wrote:
>> >>
>> >> > Each key is a feature and each attribute is the topK frequent patterns
>> >> where
>> >> > the feature exist
>> >>
>> >> Still a bit confused.
>> >> Given:
>> >> Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490), ([17,
>> 12,
>> >> 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18,
>> 68],90229),
>> >> ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062), ([12,
>> 31,
>> >> 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18,
>> 31,
>> >> 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610),
>> ([16,
>> >> 68],87933),
>> >>
>> >> So, 68 is the feature in question.  That makes sense.  Then, what is the
>> >> significance of the [] areas, as in [68],90692 or [17,12,68], 90481.
>>  Why
>> >> all the repetition?
>> >>
>> >> -Grant
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: FP Growth Understanding

Robin Anil
Cool. Thanks for sharing this. I will file a jira issue over this.

Robin



On Mon, Feb 15, 2010 at 9:52 PM, Neal Richter <[hidden email]> wrote:

> I have no problem with the repetition!
>
> I'll have to poke at this a bit more, but I like the switches ideas.
> I often use Christian Borgelt's itemset implementations for playing
> with data.  He's implemented a nice set of switches, see below.
> Setting a minimum support threshold and mimimum itemset size are both
> convenient and tend to make the algorithm run a bit faster.
>
> http://www.borgelt.net/software.html
>
> nealr@nrichter-laptop:~$ fpgrowth_fim
> usage: fpgrowth_fim [options] infile outfile
> find frequent item sets with the fpgrowth algorithm
> version 1.13 (2008.05.02)        (c) 2004-2008   Christian Borgelt
> -m#      minimal number of items per item set (default: 1)
> -n#      maximal number of items per item set (default: no limit)
> -s#      minimal support of an item set (default: 10%)
>         (positive: percentage, negative: absolute number)
> -d#      minimal binary logarithm of support quotient (default: none)
> -p#      output format for the item set support (default: "%.1f")
> -a       print absolute support (number of transactions)
> -g       write output in scanable form (quote certain characters)
> -q#      sort items w.r.t. their frequency (default: -2)
>         (1: ascending, -1: descending, 0: do not sort,
>          2: ascending, -2: descending w.r.t. transaction size sum)
> -u       use alternative tree projection method
> -z       do not prune tree projections to bonsai
> -j       use quicksort to sort the transactions (default: heapsort)
> -i#      ignore records starting with a character in the given string
> -b/f/r#  blank characters, field and record separators
>         (default: " \t\r", " \t", "\n")
> infile   file to read transactions from
> outfile  file to write frequent item se
>
> On Mon, Feb 15, 2010 at 9:14 AM, Robin Anil <[hidden email]> wrote:
> > Hi Neal,
> >             I know there is repetition. I tried sticking true to the
> > original algorithm that is finding closed patterns and using the longest
> > one.
> >
> > Say if 68 and 12 occurs 1000 times
> > and 68 12 17 also occurs 1000 times, there so information that former
> > pattern gives you. So, you can remove it. Therefore you say that 68 12 17
> is
> > a closed pattern and all the patterns it is enclosing are removed.
> >
> > had 68 alone occurred 2000 times. It no longer becomes a closed pattern..
> >
> > Things could be made configurable by having a flag to remove closed
> patterns
> > within a percentage of the support Or mine only patterns > 3 items in
> > length. These are tricky but could be done.
> >
> > Robin
> >
> >
> > On Mon, Feb 15, 2010 at 9:34 PM, Neal Richter <[hidden email]>
> wrote:
> >
> >> Grant:  Chapter 5 of Han and Kamber (Data Mining: Concepts and
> >> Techniques) detail itemset mining and the fpgrowth alg.  Han is a
> >> co-inventor of it.
> >>
> >> There is a bit of repetition in the output compared to other itemset
> >> mining packages, though this structure is convenient for relational
> >> indexing by key.
> >>
> >> - Neal
> >>
> >> On Mon, Feb 15, 2010 at 6:49 AM, Robin Anil <[hidden email]>
> wrote:
> >> > Ok.. A bit more background..
> >> >
> >> > An Itemset is a subset I1, I2, I3... In
> >> >
> >> > so [I2, I4, I7] is an itemset and the support(no of times its visible
> in
> >> the
> >> > dataset) is say Y
> >> >
> >> > A Pattern is Pair<Itemset, support>
> >> >
> >> > Take a look at in this format
> >> >
> >> > 68:
> >> >     ([68],90692),
> >> >     ([17, 68],90683),
> >> >     ([12, 68],90490),
> >> >     ([17, 12, 68],90481),
> >> >     ([18, 68],90291)
> >> >
> >> > these are top patterns containing 68 and their support in descending
> >> order
> >> > 68 occurs with 12,  90490 times
> >> >
> >> > Robin
> >> >
> >> >
> >> > On Mon, Feb 15, 2010 at 6:27 PM, Grant Ingersoll <[hidden email]
> >> >wrote:
> >> >
> >> >>
> >> >> On Feb 14, 2010, at 11:37 PM, Robin Anil wrote:
> >> >>
> >> >> > Each key is a feature and each attribute is the topK frequent
> patterns
> >> >> where
> >> >> > the feature exist
> >> >>
> >> >> Still a bit confused.
> >> >> Given:
> >> >> Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490),
> ([17,
> >> 12,
> >> >> 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18,
> >> 68],90229),
> >> >> ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062),
> ([12,
> >> 31,
> >> >> 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18,
> >> 31,
> >> >> 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610),
> >> ([16,
> >> >> 68],87933),
> >> >>
> >> >> So, 68 is the feature in question.  That makes sense.  Then, what is
> the
> >> >> significance of the [] areas, as in [68],90692 or [17,12,68], 90481.
> >>  Why
> >> >> all the repetition?
> >> >>
> >> >> -Grant
> >> >
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: FP Growth Understanding

Neal Richter-3
Note that there is sort-of standard input and output spec for itemset
mining that was defined for the FIMI'03 and FIMI'04 workshops.

http://fimi.cs.helsinki.fi/
http://fimi.cs.helsinki.fi/fimi04/rules.html

Having a switch to adhere to that simple standard could be useful as well.

Code submitted to that workshop also implemented open, closed and
maximal itemsets as well.

- Neal

On Mon, Feb 15, 2010 at 9:25 AM, Robin Anil <[hidden email]> wrote:

> Cool. Thanks for sharing this. I will file a jira issue over this.
>
> Robin
>
>
>
> On Mon, Feb 15, 2010 at 9:52 PM, Neal Richter <[hidden email]> wrote:
>
>> I have no problem with the repetition!
>>
>> I'll have to poke at this a bit more, but I like the switches ideas.
>> I often use Christian Borgelt's itemset implementations for playing
>> with data.  He's implemented a nice set of switches, see below.
>> Setting a minimum support threshold and mimimum itemset size are both
>> convenient and tend to make the algorithm run a bit faster.
>>
>> http://www.borgelt.net/software.html
>>
>> nealr@nrichter-laptop:~$ fpgrowth_fim
>> usage: fpgrowth_fim [options] infile outfile
>> find frequent item sets with the fpgrowth algorithm
>> version 1.13 (2008.05.02)        (c) 2004-2008   Christian Borgelt
>> -m#      minimal number of items per item set (default: 1)
>> -n#      maximal number of items per item set (default: no limit)
>> -s#      minimal support of an item set (default: 10%)
>>         (positive: percentage, negative: absolute number)
>> -d#      minimal binary logarithm of support quotient (default: none)
>> -p#      output format for the item set support (default: "%.1f")
>> -a       print absolute support (number of transactions)
>> -g       write output in scanable form (quote certain characters)
>> -q#      sort items w.r.t. their frequency (default: -2)
>>         (1: ascending, -1: descending, 0: do not sort,
>>          2: ascending, -2: descending w.r.t. transaction size sum)
>> -u       use alternative tree projection method
>> -z       do not prune tree projections to bonsai
>> -j       use quicksort to sort the transactions (default: heapsort)
>> -i#      ignore records starting with a character in the given string
>> -b/f/r#  blank characters, field and record separators
>>         (default: " \t\r", " \t", "\n")
>> infile   file to read transactions from
>> outfile  file to write frequent item se
>>
>> On Mon, Feb 15, 2010 at 9:14 AM, Robin Anil <[hidden email]> wrote:
>> > Hi Neal,
>> >             I know there is repetition. I tried sticking true to the
>> > original algorithm that is finding closed patterns and using the longest
>> > one.
>> >
>> > Say if 68 and 12 occurs 1000 times
>> > and 68 12 17 also occurs 1000 times, there so information that former
>> > pattern gives you. So, you can remove it. Therefore you say that 68 12 17
>> is
>> > a closed pattern and all the patterns it is enclosing are removed.
>> >
>> > had 68 alone occurred 2000 times. It no longer becomes a closed pattern..
>> >
>> > Things could be made configurable by having a flag to remove closed
>> patterns
>> > within a percentage of the support Or mine only patterns > 3 items in
>> > length. These are tricky but could be done.
>> >
>> > Robin
>> >
>> >
>> > On Mon, Feb 15, 2010 at 9:34 PM, Neal Richter <[hidden email]>
>> wrote:
>> >
>> >> Grant:  Chapter 5 of Han and Kamber (Data Mining: Concepts and
>> >> Techniques) detail itemset mining and the fpgrowth alg.  Han is a
>> >> co-inventor of it.
>> >>
>> >> There is a bit of repetition in the output compared to other itemset
>> >> mining packages, though this structure is convenient for relational
>> >> indexing by key.
>> >>
>> >> - Neal
>> >>
>> >> On Mon, Feb 15, 2010 at 6:49 AM, Robin Anil <[hidden email]>
>> wrote:
>> >> > Ok.. A bit more background..
>> >> >
>> >> > An Itemset is a subset I1, I2, I3... In
>> >> >
>> >> > so [I2, I4, I7] is an itemset and the support(no of times its visible
>> in
>> >> the
>> >> > dataset) is say Y
>> >> >
>> >> > A Pattern is Pair<Itemset, support>
>> >> >
>> >> > Take a look at in this format
>> >> >
>> >> > 68:
>> >> >     ([68],90692),
>> >> >     ([17, 68],90683),
>> >> >     ([12, 68],90490),
>> >> >     ([17, 12, 68],90481),
>> >> >     ([18, 68],90291)
>> >> >
>> >> > these are top patterns containing 68 and their support in descending
>> >> order
>> >> > 68 occurs with 12,  90490 times
>> >> >
>> >> > Robin
>> >> >
>> >> >
>> >> > On Mon, Feb 15, 2010 at 6:27 PM, Grant Ingersoll <[hidden email]
>> >> >wrote:
>> >> >
>> >> >>
>> >> >> On Feb 14, 2010, at 11:37 PM, Robin Anil wrote:
>> >> >>
>> >> >> > Each key is a feature and each attribute is the topK frequent
>> patterns
>> >> >> where
>> >> >> > the feature exist
>> >> >>
>> >> >> Still a bit confused.
>> >> >> Given:
>> >> >> Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490),
>> ([17,
>> >> 12,
>> >> >> 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18,
>> >> 68],90229),
>> >> >> ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062),
>> ([12,
>> >> 31,
>> >> >> 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17, 18,
>> >> 31,
>> >> >> 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31, 68],88610),
>> >> ([16,
>> >> >> 68],87933),
>> >> >>
>> >> >> So, 68 is the feature in question.  That makes sense.  Then, what is
>> the
>> >> >> significance of the [] areas, as in [68],90692 or [17,12,68], 90481.
>> >>  Why
>> >> >> all the repetition?
>> >> >>
>> >> >> -Grant
>> >> >
>> >>
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: FP Growth Understanding

Robin Anil
It already does check the mahout wiki.

On Mon, Feb 15, 2010 at 10:08 PM, Neal Richter <[hidden email]> wrote:

> Note that there is sort-of standard input and output spec for itemset
> mining that was defined for the FIMI'03 and FIMI'04 workshops.
>
> http://fimi.cs.helsinki.fi/
> http://fimi.cs.helsinki.fi/fimi04/rules.html
>
> Having a switch to adhere to that simple standard could be useful as well.
>
> Code submitted to that workshop also implemented open, closed and
> maximal itemsets as well.
>
> - Neal
>
> On Mon, Feb 15, 2010 at 9:25 AM, Robin Anil <[hidden email]> wrote:
> > Cool. Thanks for sharing this. I will file a jira issue over this.
> >
> > Robin
> >
> >
> >
> > On Mon, Feb 15, 2010 at 9:52 PM, Neal Richter <[hidden email]>
> wrote:
> >
> >> I have no problem with the repetition!
> >>
> >> I'll have to poke at this a bit more, but I like the switches ideas.
> >> I often use Christian Borgelt's itemset implementations for playing
> >> with data.  He's implemented a nice set of switches, see below.
> >> Setting a minimum support threshold and mimimum itemset size are both
> >> convenient and tend to make the algorithm run a bit faster.
> >>
> >> http://www.borgelt.net/software.html
> >>
> >> nealr@nrichter-laptop:~$ fpgrowth_fim
> >> usage: fpgrowth_fim [options] infile outfile
> >> find frequent item sets with the fpgrowth algorithm
> >> version 1.13 (2008.05.02)        (c) 2004-2008   Christian Borgelt
> >> -m#      minimal number of items per item set (default: 1)
> >> -n#      maximal number of items per item set (default: no limit)
> >> -s#      minimal support of an item set (default: 10%)
> >>         (positive: percentage, negative: absolute number)
> >> -d#      minimal binary logarithm of support quotient (default: none)
> >> -p#      output format for the item set support (default: "%.1f")
> >> -a       print absolute support (number of transactions)
> >> -g       write output in scanable form (quote certain characters)
> >> -q#      sort items w.r.t. their frequency (default: -2)
> >>         (1: ascending, -1: descending, 0: do not sort,
> >>          2: ascending, -2: descending w.r.t. transaction size sum)
> >> -u       use alternative tree projection method
> >> -z       do not prune tree projections to bonsai
> >> -j       use quicksort to sort the transactions (default: heapsort)
> >> -i#      ignore records starting with a character in the given string
> >> -b/f/r#  blank characters, field and record separators
> >>         (default: " \t\r", " \t", "\n")
> >> infile   file to read transactions from
> >> outfile  file to write frequent item se
> >>
> >> On Mon, Feb 15, 2010 at 9:14 AM, Robin Anil <[hidden email]>
> wrote:
> >> > Hi Neal,
> >> >             I know there is repetition. I tried sticking true to the
> >> > original algorithm that is finding closed patterns and using the
> longest
> >> > one.
> >> >
> >> > Say if 68 and 12 occurs 1000 times
> >> > and 68 12 17 also occurs 1000 times, there so information that former
> >> > pattern gives you. So, you can remove it. Therefore you say that 68 12
> 17
> >> is
> >> > a closed pattern and all the patterns it is enclosing are removed.
> >> >
> >> > had 68 alone occurred 2000 times. It no longer becomes a closed
> pattern..
> >> >
> >> > Things could be made configurable by having a flag to remove closed
> >> patterns
> >> > within a percentage of the support Or mine only patterns > 3 items in
> >> > length. These are tricky but could be done.
> >> >
> >> > Robin
> >> >
> >> >
> >> > On Mon, Feb 15, 2010 at 9:34 PM, Neal Richter <[hidden email]>
> >> wrote:
> >> >
> >> >> Grant:  Chapter 5 of Han and Kamber (Data Mining: Concepts and
> >> >> Techniques) detail itemset mining and the fpgrowth alg.  Han is a
> >> >> co-inventor of it.
> >> >>
> >> >> There is a bit of repetition in the output compared to other itemset
> >> >> mining packages, though this structure is convenient for relational
> >> >> indexing by key.
> >> >>
> >> >> - Neal
> >> >>
> >> >> On Mon, Feb 15, 2010 at 6:49 AM, Robin Anil <[hidden email]>
> >> wrote:
> >> >> > Ok.. A bit more background..
> >> >> >
> >> >> > An Itemset is a subset I1, I2, I3... In
> >> >> >
> >> >> > so [I2, I4, I7] is an itemset and the support(no of times its
> visible
> >> in
> >> >> the
> >> >> > dataset) is say Y
> >> >> >
> >> >> > A Pattern is Pair<Itemset, support>
> >> >> >
> >> >> > Take a look at in this format
> >> >> >
> >> >> > 68:
> >> >> >     ([68],90692),
> >> >> >     ([17, 68],90683),
> >> >> >     ([12, 68],90490),
> >> >> >     ([17, 12, 68],90481),
> >> >> >     ([18, 68],90291)
> >> >> >
> >> >> > these are top patterns containing 68 and their support in
> descending
> >> >> order
> >> >> > 68 occurs with 12,  90490 times
> >> >> >
> >> >> > Robin
> >> >> >
> >> >> >
> >> >> > On Mon, Feb 15, 2010 at 6:27 PM, Grant Ingersoll <
> [hidden email]
> >> >> >wrote:
> >> >> >
> >> >> >>
> >> >> >> On Feb 14, 2010, at 11:37 PM, Robin Anil wrote:
> >> >> >>
> >> >> >> > Each key is a feature and each attribute is the topK frequent
> >> patterns
> >> >> >> where
> >> >> >> > the feature exist
> >> >> >>
> >> >> >> Still a bit confused.
> >> >> >> Given:
> >> >> >> Key: 68: Value: ([68],90692), ([17, 68],90683), ([12, 68],90490),
> >> ([17,
> >> >> 12,
> >> >> >> 68],90481), ([18, 68],90291), ([17, 18, 68],90282), ([12, 18,
> >> >> 68],90229),
> >> >> >> ([17, 12, 18, 68],90220), ([31, 68],89071), ([17, 31, 68],89062),
> >> ([12,
> >> >> 31,
> >> >> >> 68],88874), ([17, 12, 31, 68],88865), ([18, 31, 68],88681), ([17,
> 18,
> >> >> 31,
> >> >> >> 68],88672), ([12, 18, 31, 68],88619), ([17, 12, 18, 31,
> 68],88610),
> >> >> ([16,
> >> >> >> 68],87933),
> >> >> >>
> >> >> >> So, 68 is the feature in question.  That makes sense.  Then, what
> is
> >> the
> >> >> >> significance of the [] areas, as in [68],90692 or [17,12,68],
> 90481.
> >> >>  Why
> >> >> >> all the repetition?
> >> >> >>
> >> >> >> -Grant
> >> >> >
> >> >>
> >> >
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: FP Growth Understanding

rulinma
This post has NOT been accepted by the mailing list yet.
In reply to this post by Robin Anil
think so.