Buzz measurement - Aggregate functions

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Buzz measurement - Aggregate functions

Marcus Herou
Hi.

Anyone have an idea of how I would create a query which finds the data
backing a trend graph where date is X and num(docs) is on Y axis ?

This is quite a common use case in "buzz" analysis and currently I'm doing a
stupid query which iterates over the date range and queries lucene for every
date. Not very fast and not very flexible.

More specifically something like this but I need to add free text query as
well and then I cannot use MySQL for performance reasons. Any ideas ?

--clip--
mysql> select count(id) as Y,publishDate as X from FeedItem where
publishDate between "2008-08-01" and "2008-08-31" group by DAY(publishDate)
order by publishDate asc;
+-------+---------------------+
| Y     | X                   |
+-------+---------------------+
| 26663 | 2008-08-01 00:00:00 |
| 22478 | 2008-08-02 00:00:00 |
| 25745 | 2008-08-03 00:00:00 |
| 30576 | 2008-08-04 00:00:00 |
| 31351 | 2008-08-05 00:00:00 |
| 31084 | 2008-08-06 00:00:00 |
| 31245 | 2008-08-07 00:00:00 |
| 29518 | 2008-08-08 00:00:00 |
| 26001 | 2008-08-09 00:00:00 |
| 28687 | 2008-08-10 00:00:00 |
| 32957 | 2008-08-11 00:00:00 |
| 33251 | 2008-08-12 00:00:00 |
| 33062 | 2008-08-13 00:00:00 |
| 33960 | 2008-08-14 00:00:00 |
| 31034 | 2008-08-15 00:00:00 |
| 26726 | 2008-08-16 00:00:00 |
| 27543 | 2008-08-17 00:00:00 |
| 36887 | 2008-08-18 00:00:00 |
| 35376 | 2008-08-19 00:00:00 |
| 34573 | 2008-08-20 00:00:00 |
| 33889 | 2008-08-21 00:00:00 |
| 30604 | 2008-08-22 00:00:00 |
| 26875 | 2008-08-23 00:00:00 |
| 27356 | 2008-08-24 00:00:00 |
| 33438 | 2008-08-25 00:00:00 |
| 33102 | 2008-08-26 00:00:00 |
| 31720 | 2008-08-27 00:00:00 |
| 26133 | 2008-08-28 00:00:00 |
| 22781 | 2008-08-29 00:00:00 |
| 20198 | 2008-08-30 00:00:00 |
|    20 | 2008-08-31 00:00:00 |
+-------+---------------------+


--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[hidden email]
http://www.tailsweep.com/
http://blogg.tailsweep.com/



--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[hidden email]
http://www.tailsweep.com/
http://blogg.tailsweep.com/
Reply | Threaded
Open this post in threaded view
|

Re: Buzz measurement - Aggregate functions

Uri Boness
you can try using the field collapse patch (currently in JIRA). You'll
probably need to manually extract the patch code and apply it yourself
as its latest update only applies to an earlier version of solr (1.3-dev).

http://issues.apache.org/jira/browse/SOLR-236

Cheers,
Uri

Marcus Herou wrote:

> Hi.
>
> Anyone have an idea of how I would create a query which finds the data
> backing a trend graph where date is X and num(docs) is on Y axis ?
>
> This is quite a common use case in "buzz" analysis and currently I'm doing a
> stupid query which iterates over the date range and queries lucene for every
> date. Not very fast and not very flexible.
>
> More specifically something like this but I need to add free text query as
> well and then I cannot use MySQL for performance reasons. Any ideas ?
>
> --clip--
> mysql> select count(id) as Y,publishDate as X from FeedItem where
> publishDate between "2008-08-01" and "2008-08-31" group by DAY(publishDate)
> order by publishDate asc;
> +-------+---------------------+
> | Y     | X                   |
> +-------+---------------------+
> | 26663 | 2008-08-01 00:00:00 |
> | 22478 | 2008-08-02 00:00:00 |
> | 25745 | 2008-08-03 00:00:00 |
> | 30576 | 2008-08-04 00:00:00 |
> | 31351 | 2008-08-05 00:00:00 |
> | 31084 | 2008-08-06 00:00:00 |
> | 31245 | 2008-08-07 00:00:00 |
> | 29518 | 2008-08-08 00:00:00 |
> | 26001 | 2008-08-09 00:00:00 |
> | 28687 | 2008-08-10 00:00:00 |
> | 32957 | 2008-08-11 00:00:00 |
> | 33251 | 2008-08-12 00:00:00 |
> | 33062 | 2008-08-13 00:00:00 |
> | 33960 | 2008-08-14 00:00:00 |
> | 31034 | 2008-08-15 00:00:00 |
> | 26726 | 2008-08-16 00:00:00 |
> | 27543 | 2008-08-17 00:00:00 |
> | 36887 | 2008-08-18 00:00:00 |
> | 35376 | 2008-08-19 00:00:00 |
> | 34573 | 2008-08-20 00:00:00 |
> | 33889 | 2008-08-21 00:00:00 |
> | 30604 | 2008-08-22 00:00:00 |
> | 26875 | 2008-08-23 00:00:00 |
> | 27356 | 2008-08-24 00:00:00 |
> | 33438 | 2008-08-25 00:00:00 |
> | 33102 | 2008-08-26 00:00:00 |
> | 31720 | 2008-08-27 00:00:00 |
> | 26133 | 2008-08-28 00:00:00 |
> | 22781 | 2008-08-29 00:00:00 |
> | 20198 | 2008-08-30 00:00:00 |
> |    20 | 2008-08-31 00:00:00 |
> +-------+---------------------+
>
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: Buzz measurement - Aggregate functions

Marcus Herou
Man.... I friend made me realize that facets will do the exact thing I want!

Example:

GET "
http://192.168.10.12:8110/solr/feedItem/select/?indent=true&q=title:test*&rows=0&facet=true&facet.date=publishDate&facet.date.start=NOW/DAY-30DAYS&facet.date.end=NOW/DAY%2B1DAY&facet.date.gap=%2B1DAY
"
<?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
 <int name="status">0</int>
 <int name="QTime">22</int>
 <lst name="params">
  <str name="facet.date.start">NOW/DAY-30DAYS</str>
  <str name="facet">true</str>
  <str name="indent">true</str>
  <str name="q">title:test*</str>
  <str name="facet.date">publishDate</str>
  <str name="facet.date.gap">+1DAY</str>
  <str name="facet.date.end">NOW/DAY+1DAY</str>
  <str name="rows">0</str>
 </lst>
</lst>
<result name="response" numFound="14814" start="0"/>
<lst name="facet_counts">
 <lst name="facet_queries"/>
 <lst name="facet_fields"/>
 <lst name="facet_dates">
  <lst name="publishDate">
        <int name="2008-09-10T00:00:00Z">52</int>
        <int name="2008-09-11T00:00:00Z">57</int>
        <int name="2008-09-12T00:00:00Z">34</int>
        <int name="2008-09-13T00:00:00Z">38</int>
        <int name="2008-09-14T00:00:00Z">32</int>
        <int name="2008-09-15T00:00:00Z">53</int>
        <int name="2008-09-16T00:00:00Z">38</int>
        <int name="2008-09-17T00:00:00Z">53</int>
        <int name="2008-09-18T00:00:00Z">64</int>
        <int name="2008-09-19T00:00:00Z">49</int>
        <int name="2008-09-20T00:00:00Z">30</int>
        <int name="2008-09-21T00:00:00Z">37</int>
        <int name="2008-09-22T00:00:00Z">43</int>
        <int name="2008-09-23T00:00:00Z">37</int>
        <int name="2008-09-24T00:00:00Z">40</int>
        <int name="2008-09-25T00:00:00Z">21</int>
        <int name="2008-09-26T00:00:00Z">23</int>
        <int name="2008-09-27T00:00:00Z">21</int>
        <int name="2008-09-28T00:00:00Z">34</int>
        <int name="2008-09-29T00:00:00Z">43</int>
        <int name="2008-09-30T00:00:00Z">50</int>
        <int name="2008-10-01T00:00:00Z">39</int>
        <int name="2008-10-02T00:00:00Z">49</int>
        <int name="2008-10-03T00:00:00Z">25</int>
        <int name="2008-10-04T00:00:00Z">22</int>
        <int name="2008-10-05T00:00:00Z">15</int>
        <int name="2008-10-06T00:00:00Z">28</int>
        <int name="2008-10-07T00:00:00Z">4</int>
        <int name="2008-10-08T00:00:00Z">0</int>
        <int name="2008-10-09T00:00:00Z">0</int>
        <int name="2008-10-10T00:00:00Z">0</int>
        <str name="gap">+1DAY</str>
        <date name="end">2008-10-11T00:00:00Z</date>
  </lst>
 </lst>
</lst>
</response>



On Fri, Oct 10, 2008 at 11:42 AM, Uri Boness <[hidden email]> wrote:

> you can try using the field collapse patch (currently in JIRA). You'll
> probably need to manually extract the patch code and apply it yourself as
> its latest update only applies to an earlier version of solr (1.3-dev).
>
> http://issues.apache.org/jira/browse/SOLR-236
>
> Cheers,
> Uri
>
>
> Marcus Herou wrote:
>
>> Hi.
>>
>> Anyone have an idea of how I would create a query which finds the data
>> backing a trend graph where date is X and num(docs) is on Y axis ?
>>
>> This is quite a common use case in "buzz" analysis and currently I'm doing
>> a
>> stupid query which iterates over the date range and queries lucene for
>> every
>> date. Not very fast and not very flexible.
>>
>> More specifically something like this but I need to add free text query as
>> well and then I cannot use MySQL for performance reasons. Any ideas ?
>>
>> --clip--
>> mysql> select count(id) as Y,publishDate as X from FeedItem where
>> publishDate between "2008-08-01" and "2008-08-31" group by
>> DAY(publishDate)
>> order by publishDate asc;
>> +-------+---------------------+
>> | Y     | X                   |
>> +-------+---------------------+
>> | 26663 | 2008-08-01 00:00:00 |
>> | 22478 | 2008-08-02 00:00:00 |
>> | 25745 | 2008-08-03 00:00:00 |
>> | 30576 | 2008-08-04 00:00:00 |
>> | 31351 | 2008-08-05 00:00:00 |
>> | 31084 | 2008-08-06 00:00:00 |
>> | 31245 | 2008-08-07 00:00:00 |
>> | 29518 | 2008-08-08 00:00:00 |
>> | 26001 | 2008-08-09 00:00:00 |
>> | 28687 | 2008-08-10 00:00:00 |
>> | 32957 | 2008-08-11 00:00:00 |
>> | 33251 | 2008-08-12 00:00:00 |
>> | 33062 | 2008-08-13 00:00:00 |
>> | 33960 | 2008-08-14 00:00:00 |
>> | 31034 | 2008-08-15 00:00:00 |
>> | 26726 | 2008-08-16 00:00:00 |
>> | 27543 | 2008-08-17 00:00:00 |
>> | 36887 | 2008-08-18 00:00:00 |
>> | 35376 | 2008-08-19 00:00:00 |
>> | 34573 | 2008-08-20 00:00:00 |
>> | 33889 | 2008-08-21 00:00:00 |
>> | 30604 | 2008-08-22 00:00:00 |
>> | 26875 | 2008-08-23 00:00:00 |
>> | 27356 | 2008-08-24 00:00:00 |
>> | 33438 | 2008-08-25 00:00:00 |
>> | 33102 | 2008-08-26 00:00:00 |
>> | 31720 | 2008-08-27 00:00:00 |
>> | 26133 | 2008-08-28 00:00:00 |
>> | 22781 | 2008-08-29 00:00:00 |
>> | 20198 | 2008-08-30 00:00:00 |
>> |    20 | 2008-08-31 00:00:00 |
>> +-------+---------------------+
>>
>>
>>
>>
>
>


--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
[hidden email]
http://www.tailsweep.com/
http://blogg.tailsweep.com/