PostgreSQL mistakes and how to avoid them with Jimmy Angelakos
May 02nd, 2024 | 15 min read
Table Of Contents
This interview is part of the simplyblock’s Cloud Commute Podcast, available on Youtube , Spotify , iTunes/Apple Podcasts , Pandora , Samsung Podcasts, and our show site .
In this installment of podcast, we’re joined by Jimmy Angelakos (X/Twitter) , a freelance consultant, talks about his experiences with customers running PostgreSQL on bare-metal, in the cloud, and on Kubernetes. He also talks about his new book ” PostgreSQL Mistakes and How to Avoid Them “.
Chris Engelbert: Welcome back everyone. Welcome to the next episode of simplyblock’s Cloud Commute podcast. Today I have a very interesting guest, very different from the other ones before, because he’s actually an author, writing a book right now. Well, I think he already published one or two at least. But he’ll talk about that himself. Welcome, Jimmy.
Jimmy Angelakos: Hi, very nice to be here.
Chris Engelbert: Very nice. Thank you for being here. Maybe we just start simple with the basic stuff. Who are you? Where are you from? What do you do for a living? Except for writing a book.
Jimmy Angelakos: My name is Jimmy Angelakos, which is obviously a Greek name. I live in Edinburgh in Scotland. I’ve been working with Postgres for maybe around 16 years now, exclusively. I haven’t used any other database in 16 years in a professional capacity. Naturally, the time came to share my experiences and I wrote a couple of books on this. Well, I actually co-wrote the ” PostgreSQL16 Administration Cookbook ” with my lovely co-authors Boris Mejías, Gianni Ciolli, Vibhor Kumar, and the sadly departed Simon Riggs, who was an awesome fellow. I’d like to pay a little tribute to him as a person, as a mentor to the entire Postgres community. He will be greatly missed.
Chris Engelbert: Thank you very much. I appreciate you sharing that because I think it was last week at the time of recording. It is a sad story for the Postgres community as a whole. Thank you for sharing that. From your professional life, for the last couple of years next to writing books, I think you’re mostly working as a consultant with a couple of different companies and customers. What do you think is the most common task? I mean, you’re probably coming in to help them optimize Postgres, optimize queries.
Jimmy Angelakos: Right. I’ve done all sorts of things in the past few years, like training customers to use Postgres in general, training them to use Postgres in a specific way that is suited to their needs. I have provided support to customers who ran Postgres, and also professional services like consulting. I can’t really say what the thing they use the most is or they request the most, but I can tell you a few of the things. Some customers come in and say, “My queries aren’t running well. What can I do?” It’s like the most frequent thing you hear. Some other people say, “Tell me what hardware to buy for Postgres.” You tell them, “I can’t really give you a response because it really depends on your workload,” which is the most important factor, I think, with databases. Everyone uses them differently. If it’s a database that is widely used as Postgres with so many use cases and so many different ways to use it, you can do analytics on it. To an extent, you can use it for transaction processing (OLTP), you can use it as a document database, with JSONB. There’s all sorts of things you can do. There’s no good answer to the things that people ask like, “Give me the best tuning parameters for Postgres,” or “How to write a query the right way.” It really depends on the amount of data you have, the type of data you have, and the sort of queries you’re going to be running.
Chris Engelbert: Yeah, that makes a lot of sense. It’s not only for the Postgres community or for Postgres. That is very true for a lot of things. From my own personal background, with a lot of programming languages or runtime environments, people ask, “What is the optimized or the optimal way of configuring it?” And they’re like, “I don’t know. Can’t give you the answer.” So, yeah, I hear where you’re coming from. All right, so… Sorry, I’m still having a little bit of a flu. So, from your personal background, you said you’ve co-written one book, but I also hinted on the fact that you’re writing another book right now, and I looked a little bit into it because it’s on Manning and it has Early Access, which is nice. But maybe you can give us a little bit of an insight of what you’re writing about.
Jimmy Angelakos: Right. So, the book that is under construction is called PostgreSQL Mistakes and how you can avoid them. So, it’s a bit of an anti-how-to. So, for people that are used to how-to books, like, “How do I partition? How do I do this? How do I do that?” It’s a bit of the other way around. I was trying to do this, but things went wrong. So, it’s experiences that I’ve collected from the things I’ve seen our customers do or the things I’ve done in the past.
Chris Engelbert: Right.
Jimmy Angelakos: And it’s really important to learn from mistakes. Everyone makes mistakes. And Postgres is very particular in how it wants things done. So if you get it right, the database is fantastic. It works very well with excellent performance. And when you start to do things a different way, you can see different results. And that’s basically the whole idea. There’s three chapters. Three chapters up on the web now. And there’s a huge fourth chapter that’s being published as we speak. That has anti-patterns that are not really restricted to Postgres. It’s things like, don’t improvise, don’t create your own distributed systems. There’s people that have spent hundreds of thousands of hours working on these problems, and you don’t need to reinvent the wheel.
Chris Engelbert: I hear you. As you said, there’s three chapters out right now. I haven’t seen the fourth one yet, so I think I have to look into that right after the recording.
Jimmy Angelakos: Manning are in the process of publishing it as we speak.
Chris Engelbert: All right, cool. But so far, I really like the second chapter and you bringing up all of the SQL code examples and showing the execution plans. And I think just by saying the word execution plan or the term execution plan, I probably lost half of the audience right now. So maybe you can give them a little bit of a feeling of what is an execution plan? Why is it so important to understand those things?
Jimmy Angelakos: Yeah, so Postgres has a quasi-intelligent query planner, which basically examines the way your query is written and produces a plan on how it’s going to get executed by the database server. It’s like, oh, they wrote this, where, this, and that, and it looks like a join. So I’m going to perform a join of these tables and then I’m going to order the results in this. So that’s the execution plan. It’s basically telling you how the database is going to execute your SQL query. Now, the planner takes into account things such as how much memory do you have or how fast are your disks that you’ve already specified in the Postgres configuration. It also takes into account things like what’s the nature of the data? What’s the cardinality, let’s say, in your tables? And these are things that are updated automatically by Postgres itself in its statistics tables. So it produces, most of the time, a really good plan. And what is a good plan? It’s the cheapest plan in terms of arbitrary cost. And arbitrary cost is calculated using those factors that I just mentioned. And it iterates through many plans for the execution, chooses the cheapest one, which will probably end up being the fastest one to execute in real-world terms. And seeing the execution plans is key to understand why your queries are running well or why they’re running slowly. Because then you can see, ah, this is what Postgres was trying to do. So maybe I should force its hand by writing this slightly differently.
Chris Engelbert: Yeah, that’s true. I think my personal favorite example is a common table expression, which ends up being a join because the query planner understands now a join is actually better. I don’t need to do the temporary heap table to store the intermediate result. So we kind of hinted where people can find the early access version. It’s at Manning. Do you want to add anything more to that? Maybe have a specific simple URL or something where people can find it.
Jimmy Angelakos: I can share the URL, but I certainly cannot spell it out.
Chris Engelbert: Ok, that’s fair enough. We’re going to put it in the show notes. That’s totally fine.
Jimmy Angelakos: Thanks very much. Yeah, I think it’s going to be an interesting book because it’s real-world use cases. And where it isn’t a real-world use case, it’s close enough. And I will tell you so in the text.
Chris Engelbert: That is true. And I agree. As I said, I’ve well kind of read through the first three. I read as much as I had time, but I really enjoyed it. And many of those code examples you brought up, as I said, especially in the second chapter, they were like, yes, either I’ve been there or I had people helping with that as well. I’ve worked for a Postgres based startup in the past. And we had people asking pretty much the same questions over and over again. So yes, for everyone using Postgres or starting using Postgres, it’s probably a pretty, pretty good pick.
Jimmy Angelakos: Thank you. I appreciate that. Yeah, as you know, people are familiar with other databases because Postgres has most recently exploded in popularity. It was kind of a niche database for a few years. And now it looks like all the enterprises are using it, all the hyperscalers are starting to use it, like AWS, Google, Azure. This means that they have recognized the value that Postgres brings to the table.
Chris Engelbert: Yeah, I agree. And I think it’s kind of interesting because you kind of hinted at that earlier. But you can do a lot of things with Postgres. There is a lot of stuff in Postgres itself. If you want document database, you have XML and JSON. If you want key value, you have hstore. But there is also a really good extensibility to Postgres, giving you the chance to plug everything else in, like time series, graph databases. I don’t know what else. You probably could define Postgres as the actual really only in the world multimodal database.
Jimmy Angelakos: Right, yeah. And we were actually considering of changing the description of Postgres on the website, where you go in and it says it’s an object relational database, which is kind of a formal, traditional way to put it. But nowadays, you’re right. I think it’s more of a multimodal database. And I think that is also the term that Simon Riggs preferred. Because it does all of these things and can also let you do. Things that the developers of Postgres hadn’t even thought of because of the extension system. Like a very famous extension is PostGIS, which is the GIS (geospatial) capabilities for Postgres, and is now considered the gold standard in geographical databases.
Chris Engelbert: True.
Jimmy Angelakos: From an open-source extension to an open-source database. And there’s like thousands of people that are professionally employed to use this extension in their day jobs, which is amazing.
Chris Engelbert: True. I agree. So let me see. Let me flip back a little bit. I mean, we’re officially a cloud podcast. We talked a lot about the cool Postgres world. And I was part of a Postgres world. I was part of the Java world. So that is mostly the guests I had so far. But because we’re a cloud podcast, what do you think, like working with all the different customers, what is your feeling? Like how many people are actually deploying Postgres in the cloud, in Kubernetes, in EC2, or anything like that?
Jimmy Angelakos: Well, the company I’m working with right now are using it on RDS. They’re using RDS Postgres because it suits their use case better in the sense that they don’t have a team that wants to worry about replication and backups and things like that. And availability zones, they want that handled as a service. And that fits their use case quite well. When you want more flexibility, you can still use the cloud. You can run, for example, Postgres on Azure boxes or EC2 boxes or whatever you want. But then you have to take care of these things yourself.
Chris Engelbert: Right.
Jimmy Angelakos: But it still gives you freedom from having to worry about hard drives and hardware and purchase orders and things like that. You just send off a check every month and you’re done. Now, Kubernetes is an interesting case. There’s a couple of operators for Postgres. The most recent one is Cloud Native PG, which is starting to get supported and getting traction from the Cloud Native Computing Foundation, which is great. And they are trying to do things in a different way that is totally cloud-native. So everything is defined as a resource in Kubernetes. But the resources map to things that are well known in Postgres, like clusters and nodes and backups and actual things so that you don’t have to perform black magic like running it in a pod, but also having to configure the pod manually to talk to another pod that is your replicant, things like that. And there are other operators that have evolved over time to approximate this ease of use. I think the Crunchy Data Operator comes to mind. It started off being very imperative. They had a command-line utility that created clusters and so on. And now they’ve turned it into a declarative, which is more cloud-native, more preferred by the Kubernetes world.I think these two are the major Postgres things that I’ve seen in Kubernetes, at least that I’ve seen in use the past few years. There are still things that haven’t been sorted because, as we said, Postgres is super flexible. And this flexibility and the ease of use of Kubernetes, where everything is taken care of automatically, comes at a cost. You have reduced flexibility when you’re on Kubernetes. So there’s things that haven’t been totally worked out yet, like how do you one-click migrate from a cluster that is outside Kubernetes to something that is running in Kubernetes? Or can you take a backup that was produced elsewhere and create a cluster in Kubernetes, a Postgres cluster from that backup? Now, once they have these things sorted and also hardware support is very important when you’re talking to databases, I think we’ll see many more people going to Postgres on Kubernetes in production. But specifically hardware and specifically disk performance and throughput and latency, you have to get into the hardware nitty-gritty of Kubernetes to take maximum advantage of Postgres because as a database, it loves fast disks. Generally speaking, the faster your disk, the faster Postgres will go.
Chris Engelbert: That is true. And just like a shameless plug, we’re working on something. But because we’re running out of time already, 20 minutes is always so super short. What do you think is going to be the next thing for Postgres, the database world, the cloud world, whatever you like. What do you think is the next thing?
Jimmy Angelakos: I can’t give you an answer, but you can go search on YouTube and you can find Simon’s last contribution to the Postgres community. He gave a talk at PostgreSQL Conference Europe last December where he said ” Postgres, the next 20 years ” or something to that effect. And he predicted things, how things will go for Postgres in the future and future directions. That’s a very interesting talk for anyone who wants to watch that. I wouldn’t want to hazard a guess because I’ve seen people just blindly accept the thing that AI is the next big thing. And everything in Postgres and databases and Java and Python is going to revolve around AI in the future. That remains to be seen.
Chris Engelbert: I like that because normally I start to say, please don’t say AI. Everyone says that. And I think AI will be a big part of the future, but I agree with you. It remains to be seen how exactly. Yeah, thank you very much. We’re going to put the video link in the show notes as well for everyone interested. And yeah, Jimmy, thank you very much. It was a pleasure having you.
Jimmy Angelakos: Thanks very much. I appreciate the invitation.
Chris Engelbert: My pleasure. And for the audience, we’re going to see or hear us next week. And thank you very much for being here.