bigquery

Top N Per Group in BigQuery

October 30, 2017

5 minute read

EDIT: After I posted this initially, I got some great feedback, so I wrote a follow-up post here.

In this post, we are going to explore a strategy for collecting the Top N results per Group over a mixed dataset, all in a single query.

I stumbled onto this solution the other day, mostly driven by the fear that I was re-scanning my BigQuery data too often. At the time, the only way I knew how to look at a Top 10 list of a subset of the data was to add a WHERE clause limiting the whole data set to a single group and combine with ORDER BY and LIMIT clauses.

For each group, I would just modify the WHERE clause, rescan all the data, and get new results. I thought there had to be an easier way to get the same ordered subset for any particular group in the data, all at once.

It turns out, there is a much more efficient way to solve this problem.

Don't Blow Your BigQuery Budget on Unknown Data!

October 6, 2017

5 minute read

It’s easy to blow your BigQuery budget when you are exploring a new data set. Because you’re billed for the amount of data scanned, not the ultimate result set, when you don’t know what you’re looking for, you can end up with wasteful queries.

In this post, I’m going to share some tips for more efficiently scanning data in BigQuery when you don’t quite know what you need.

bigquery

Home

About

Blog

Categories

Recent Posts

SQL Quick Tip: Deduping Data with Row Number

SQL Quick Tip: Find the Latest Record for Each Member of a Group

SQL Quick Tip: Find Missing Data

SQL Quick Tip: Showing Changes in Your Data

SQL Quick Tip: Guarantee Rows for Every Date in Your Report

bigquery

More Efficient Solutions to the Top N per Group Problem

Top N Per Group in BigQuery

Don't Blow Your BigQuery Budget on Unknown Data!

Dan Kleiman

Recent Posts

SQL Quick Tip: Deduping Data with Row Number

SQL Quick Tip: Find the Latest Record for Each Member of a Group

SQL Quick Tip: Find Missing Data

SQL Quick Tip: Showing Changes in Your Data

SQL Quick Tip: Guarantee Rows for Every Date in Your Report

Categories

Home

About

Blog

Categories

bigquery

Stay up to date via email