Warning: This post is NSFW. In this series, we are going to build a really, really simple database management system that you should by no means use in a production work environment. Here’s the experiment: Start with a naive implementation of a database – read and write from a local csv file Reach for all the normal features we use every day: basic CRUD, aggregate queries, complex joins Realize that our basic implementation falls short Look at how these problems are solved in modern systems Think of the whole series as one giant experiment in Cunningham’s Law: “the best way to get the right answer on the internet is not to ask a question; it’s to post the wrong answer.
In my last post, I wrote about steps you can take to make writing complicated queries more manageable. One aspect that I didn’t cover in that post is how to set sample data to work with during the writing process. Assuming you’re not working directly in your production database as you test out new queries (right? right?? right???), you need some way to work on your new ideas. In this post, I want to share some tips and tricks for creating reliable, reproduceable test data to help you develop new ideas in SQL.
How many times have you started off building a complicated analytical SQL query like this? SELECT … . uh??? . . SELECT * FROM … SELECT … And you get stuck trying to figure out exactly what you want to select. You’re thinking about averages, group by’s, the order of your results or some change you want to see over time and the query editor is just sitting there, taunting you, because in SQL, you have to know up front what you want to select into your final results.
If you are a software engineer and you have just enough SQL to write queries that count, sum, average join and maybe sub-select, then I’m writing this post for you. If, when you need more complicated analysis or computation, you pull your query results into excel or your favorite scripting language to do more processing, then I have some good news.
There’s a whole lot more you can do right in SQL and it’s not too bad to learn how to do it.
In this post, I’m going to cover a few concepts that have recently helped me do more computation, better analysis, and it turns out, more efficient querying…and I hope they help you too.