In this post, we’re going to look at a techinque for finding the lastest full record for each member of a group. Now, it might not be obvious why this is an annoying problem, but I see people back into having this problem from two different directions. First, you know how to find the MAX date for each member in a group: metric | latest_metric ——–+———————— A | 2019-04-30 00:00:00-04 B | 2019-05-08 00:00:00-04 C | 2019-05-08 00:00:00-04 D | 2019-05-08 00:00:00-04 E | 2019-05-08 00:00:00-04 F | 2019-05-08 00:00:00-04 (6 rows) But you need additional column data from the same row as that most recent date.
Whenever you have two sets of data and you need to find the entries that are in your first set, but not in your second set, use this pattern. Let’s say you have an application that tracks user sign-ups separately for user sign-ins. You might be interested in knowing which users have signed up, but never signed in. Postgres makes it easy to mock up some sample data so we can work through this use case.
In this tip, we want to look at a concise way to shows changes in your data. I tend to think of this type of problem as a going from “finding” data to “describing” data. For example, if you know how to get every value for a user in the database for the last 30 days, then you can “find” data. When you calculate aggregates of that data using functions like MAX, MIN, SUM, or AVG, you are now “describing” the data.
When you are reporting on metrics over time, sometimes your data will have missing entries on certain days.
In these cases, it’s useful to be able to ensure that every date shows up in your report, regardless of whether or not there is a metric in the dataset for that date.
Let’s use daily user logins to a website for a reporting metric to illustrate how you solve this problem.
Usually, when you add an
ORDER BY clause to your SQL query, you want to sort by your columns’ values.
To track the top 10 cryptocurrencies by price over the last 90 days, for example, you would write a query like this:
Warning: This post is NSFW. In this series, we are going to build a really, really simple database management system that you should by no means use in a production work environment. Here’s the experiment: Start with a naive implementation of a database – read and write from a local csv file Reach for all the normal features we use every day: basic CRUD, aggregate queries, complex joins Realize that our basic implementation falls short Look at how these problems are solved in modern systems Think of the whole series as one giant experiment in Cunningham’s Law: “the best way to get the right answer on the internet is not to ask a question; it’s to post the wrong answer.
In my last post, I wrote about steps you can take to make writing complicated queries more manageable. One aspect that I didn’t cover in that post is how to set sample data to work with during the writing process. Assuming you’re not working directly in your production database as you test out new queries (right? right?? right???), you need some way to work on your new ideas. In this post, I want to share some tips and tricks for creating reliable, reproduceable test data to help you develop new ideas in SQL.
How many times have you started off building a complicated analytical SQL query like this? SELECT … . uh??? . . SELECT * FROM … SELECT … And you get stuck trying to figure out exactly what you want to select. You’re thinking about averages, group by’s, the order of your results or some change you want to see over time and the query editor is just sitting there, taunting you, because in SQL, you have to know up front what you want to select into your final results.