Skip to main content

Avatar photo

Getting Started with Graph Databases with Jennifer Reif from Neo4j

Jul 12th, 2024 | 26 min read

Introduction

This interview is part of the simplyblock Cloud Frontier Podcast, available on Youtube , Spotify , iTunes/Apple Podcasts , and our show site .

In this episode of the Cloud Commute podcast, host Chris Engelbert interviews Jennifer Reif, a Developer Advocate at Neo4j. Jennifer delves into the fundamentals of graph databases, explaining how they differ from traditional relational databases and why they are uniquely suited for specific use cases. If you’re curious about graph databases and their practical applications, this episode is a must-listen.

Key Takeaways

Why Isn’t SQL the right Fit for Graph Databases?

SQL is designed for querying relational databases, where data is organized in tables. While powerful for certain tasks, SQL struggles with complex queries involving multiple relationships, which are common in graph databases. Graph databases like Neo4j are optimized for handling deeply interconnected data, where relationships are as crucial as the entities themselves. In these scenarios, using a graph query language like Cypher, which visually represents relationships and paths, simplifies the query process and enhances performance.

What are Graph Databases used For?

Graph databases are particularly effective in use cases involving complex relationships, such as social networks, supply chains, and fraud detection. Graph databases excel in scenarios where data is interconnected, allowing users to efficiently navigate and query these relationships. Neo4j, for example, was instrumental in analyzing the Panama Papers, where journalists used it to uncover hidden relationships between entities in a massive dataset.

What is the Difference between a Graph Database and a Data Table?

A graph database stores data as nodes (entities) and edges (relationships), allowing for a more flexible and intuitive representation of complex data structures. In contrast, a data table in a relational database organizes data into rows and columns, which can become cumbersome when dealing with intricate relationships. Graph databases eliminate the need for extensive joins and complex queries, making it easier to explore and extract value from interconnected data.

EP20: Getting Started with Graph Databases with Jennifer Reif from Neo4j

In addition to highlighting the key takeaways, it’s essential to provide deeper context and insights that enrich the listener’s understanding of the episode. By offering this added layer of information, we ensure that when you tune in, you’ll have a clearer grasp of the nuances behind the discussion. This approach enhances your engagement with the content and helps shed light on the reasoning and perspective behind the thoughtful questions posed by our host, Chris Engelbert. Ultimately, this allows for a more immersive and insightful listening experience.

Key Learnings

How do Graph Databases Work?

Graph databases store data in nodes and edges, representing entities and their relationships. This structure allows for efficient querying of complex, interconnected data. Unlike relational databases, which require multiple joins to traverse relationships, graph databases can quickly navigate through connected nodes, making them ideal for applications with deeply nested relationships.

Simplyblock Insight: ****

While graph databases handle relationships efficiently, simplyblock provides the necessary infrastructure to ensure that these databases perform optimally in cloud environments. By offering reliable storage and high availability, simplyblock supports the scalability and resilience needed for managing large, interconnected datasets.

How are Graph Databases Implemented?

Graph databases are implemented using graph data models, where nodes represent entities, and edges represent the connections between them. Jennifer mentions that Neo4j uses a native graph processing engine, which allows for efficient querying and storage of graph data. This native approach ensures that graph operations are optimized, reducing latency and improving performance compared to non-native graph solutions.

Simplyblock Insight:

Implementing graph databases on platforms like Kubernetes is simplified with simplyblock’s storage solutions, which ensure that data persistence and recovery are handled seamlessly. Whether using Helm charts or Operators, simplyblock’s infrastructure ensures that Neo4j and other graph databases can be deployed and managed with minimal operational overhead.

What are the Practical Applications of Graph Databases?

Graph databases are widely used in areas where understanding relationships is key, such as social networking, fraud detection, recommendation systems, and supply chain management. Jennifer highlights how these databases allow organizations to uncover hidden patterns and insights by exploring the connections between data points, which would be difficult or impossible to achieve with traditional databases.

Simplyblock Insight:

Simplyblock’s platform complements these applications by providing a robust infrastructure that ensures high availability and consistent performance, even under heavy query loads. This makes it possible to apply graph databases in mission-critical applications where downtime or performance degradation is not an option.

Additional Nugget of Information

How do Graph Databases Handle Scalability in Large, Distributed Environments?

As datasets grow and become more interconnected, the ability to scale a graph database efficiently becomes crucial. Graph databases like Neo4j are designed to handle scalability challenges by distributing data across multiple nodes while maintaining the integrity of the relationships between entities. This distributed approach allows graph databases to manage large volumes of data without sacrificing performance, making them well-suited for enterprise-level applications.

Conclusion

Jennifer Reif offers a comprehensive introduction to graph databases, highlighting their strengths and how they differ from traditional relational databases. She emphasizes that graph databases, like Neo4j, are powerful tools for managing and querying complex relationships in data, making them invaluable in various industries. As the technology landscape continues to evolve, graph databases are poised to play a crucial role in applications where understanding relationships is key.

Whether you’re new to graph databases or looking to deepen your understanding, this conversation provides valuable insights into how they work and why they are increasingly important in today’s data-driven world. Be sure to tune in to future episodes of the Cloud Commute podcast for more expert discussions.

Full Video Transcript

Chris Engelbert: Hello. Well, welcome back everyone. Welcome back to the next episode of simplyblock’s Cloud Commute Podcast. This week I have another incredible guest. I know I say that every single time. It’s just true. They’re all incredible. And you know that, right? So, hello Jennifer. Um, maybe just give us a quick introduction real quick. Jennifer Reif: Sure. My name is Jennifer Reif. I’m a developer advocate at Neo4j, focusing on Java technologies and its ecosystem. So I cover the gamut on almost anything Java. I’ve worked at Neo4j since 2018, let me put it that way. It’s been a little bit. I show up at some conferences, write blog posts, do videos, and contribute to Neo4j’s podcast, graphstuff.fm . I also work on code demo projects and presentations, the whole nine yards. So I am happy to be here to talk to Christoph and chat a little bit about technology and Neo4j and so on. Chris Engelbert: Awesome. I think you’re actually the first person who ever said Christoph on the stream. So now people know how I’m really called. Chris is fine. It’s so much easier for the rest of the world, but you said you’re working for Neo4j. Obviously, I know what Neo4j is, but maybe just give the others a quick introduction. What is cool about it? What is it? And you know the spiel. Jennifer Reif: Sure. Neo4j is a graph database. And I guess to start off, just like any other database, it stores data. A lot of people will say, “oh, is it a layer on top of another type of database?” No, it actually is a storage system. You store the data rights to disk and the whole gamut there. But it stores data differently than rows, tables, documents, so on. It stores data as entities and then relationships between them. So you actually write the relationships to the database. That makes it really easy to read those relationships back. So anything where you have a lot of complex relationships or a lot of relationships and a lot of hops through different types of data, a graph database is going to be optimized and more performant for those types of queries. Chris Engelbert: Right. Jennifer Reif: So lots of things like networks or social network structures, supply chains, where you have a lot of depth and hopping around, even just fraud detection and there’s a variety of different use cases, software dependencies, lots of other things. So I’ve seen it used for kind of hit or miss just kind of random things that it’s like, “oh, I would have never thought to use a graph for that,” but it works really, really well for any type of case where you have a lot of relationships and a lot of connections in your data. Chris Engelbert: So that’s interesting. I think the weirdest thing that I’ve built, and at the same time, the most efficient thing was actually a permission system, with inheritance, and roles and permissions and inheritance between the different roles, because you’re basically can make a single like Cypher request and say, “give me every permission that is somehow in the hierarchy or in the inheritance graph, and remove everything that might be overridden” as, what is the term, uh, is it out, um, uh, denied. That’s it. Yeah. Blocked or denied. I like that. So that was, that was really nice. And it was so much easier than, than doing like a graph, or like a table tree kind of recursive SQL lookup on a relational database. Um, yeah, I think I still have the code somewhere. Jennifer Reif: That would be really cool. You should publish that somewhere or like, you know, highlight somewhere. Chris Engelbert: I can try to find it and, um, well, let’s see if I, maybe I hand it to you. Jennifer Reif: Yeah. I’ve seen some geology or like family tree type of scenarios. Chris Engelbert: In just a couple of lines, it was like, I think, uh, three types and four relationships or something, and you’re done. It was brilliant. Anyway. So you said it’s a graph database and you gave a couple of ideas what a graph database could be used for. And well, I hinted on why graph databases might be easier. Right. So especially when you do like topology or relate any kind of relation lookups, you said social networks, parent or family trees, anything like that, where you have relations, especially like when you look at European history, like between the different Kings families, and there’s a lot of connections and relations between almost all families. Jennifer Reif: Yeah. Chris Engelbert: So if you’re trying to understand or to look into those kinds of things, graphs are super helpful and much easier. But what would you say is like the biggest difference from a, from a typical database, for example, like a relational database, except you said that Neo4j or graph database store it slightly different. Jennifer Reif: I’m slightly biased. So I have a long list of things. I love a graph database over other things. But if I had to narrow it down to just one, the thing that I find most impactful is that you don’t need to have expert knowledge about the data model in order to pull valuable data from a graph database. So you had mentioned, you know, you have a few different types of relationships. You don’t have to know what those relationships are going into the graph database, you say, “hey, look, I know I have these entities, find all the ways they’re connected and remove the connections that are, you know, the denials or the denied or blocked or whatever credentials or access paths,” and you can filter those types of relationships out and with a relational database, sure, that’s probably possible, but the amount of work and the amount of knowledge you have to have upfront first of the data model and second of SQL in order to handle those very complex filterings and like sub queries and so on is a lot higher. That learning curve is a lot higher. Um, so that’s the thing that I love most about graph databases is the data model itself is not required to know it upfront well, and then it’s naturally very visual. So it’s just easier to navigate and easier to just explore without having this massive learning curve upfront to know the data. Chris Engelbert: I love that. Um, specifically as far as I remember Neo4j was involved into a lot of like analytical use cases, uh, towards things like the Panama papers, right? As far as I remember Panama papers, like the whole network was basically put into Neo4j and then the journalists started analyzing this massive graph and how all those companies worked together. And that is exactly what you said, right? You don’t have to understand or have to know yet how those things are connected or is it people, is it companies that somehow work together that make the relation? Um, you figure that out while you’re looking at the data and while you’re looking at the graph and trying to understand what that means. Jennifer Reif: Yeah. My favorite thing is to just take a data set that looks interesting to me. Dump it into Neo4j and then just start querying and see what interesting things I find from it. And then that’s what I end up focusing on and playing around with where I feel like a relational database, it’s almost the opposite. Um, you have to really kind of figure out and look at the data and the spreadsheets or whatever, you know, data format you have and figure out, “okay, what does the structure look like? How can I make the connections from one hop to the next table and so on?” And a graph is a little bit of the reverse there. Yeah. Chris Engelbert: Well, I’m not sure it’s about a general graph database thing or is that very specific to Neo4j because you don’t necessarily need a schema. Jennifer Reif: Yeah. I know there are some other graph databases that kind of have that optional schema, schemaless, schema free, however you want to term it. And Neo4j is not the only one in that category. But I feel like just the length of time that Neo4j has been around that, you know, we kind of have like a leg up on a lot of the other graph databases, so those that do provide that capability. Um, it’s just a really nice feature. Chris Engelbert: Right. Yeah. I’m asking because I think for relational databases one of the critics or points that people always talked about and the whole like NoSQL thing where it came from was like, you don’t want the schema. You want this kind of schemaless, you have an optional schema and if the schema can evolve over time, but with SQL database, or at least relational database, not necessarily SQL, but relational database, you have to come up with like relational model upfront and define it. Um, and I think that is where a lot of like the problems come when you have an unknown dataset and a very complex dataset, if it evolves over time, it’s probably fine, but when you get something it’s probably much more complicated. So as a developer, I mean, I’m coming from a relational world. Um, so I’m a Postgres developer, but I understand I may need a graph database like Neo4j. So how would I get started with that? Jennifer Reif: Well, one of the best ways we have currently is our database as a service, um, called Aura, Neo4j Aura. Um, and we have free instances. So we have, you know, different tiers, of course, uh, we have a free tier and then kind of your paid tiers above that, depending on your, on your needs there. But the free tier is a really great place to start. Um, there’s lots of tools surrounding that free tier. So they have like a data importer tool where you can dump, you can load up like PDFs or, or CSVs or some other different types of data and it will kind of help you get that data into a graph. So you don’t have to have that knowledge upfront. And then you can kind of query or play around with our visualization tool called Bloom, and it kind of is a natural language query interface. So you don’t have to know a lot of Cypher upfront. Um, even the Cypher portion of it, there’s guides that kind of walk you through, and so it’s just a, we try our best to have a very low barrier to entry pathway there for people to learn. Chris Engelbert: I think the… You mentioned Cypher, the thing that makes Cypher from my perspective, so much better than the other graph languages is that it actually looks like ASCII art. It looks beautiful. You look at the query and at some, if you go a little bit deeper and use some of more complex constructs, it’s a little bit more complicated to understand if you don’t know how it works, but like a standard graph query over multiple nodes and relationships, you look at that and it’s an arrow telling you, “oh, here’s a node, here’s the relationship, and that’s what I expect, and that is how many you can have between those.” I just love it. Whoever came up with Cypher. Thank you. Thank you for the love of God. Jennifer Reif: It’s a super approachable query language. I feel like I had learned several years of SQL before I even knew about Cypher, um, and when I came over to the light side, if you will, at Neo4j, um, and started exploring Cypher, there were several things that it’s like, “why in the world isn’t everybody, you know, using something like this?” Because it’s very easy to read, very easy to construct, at least kind of the general starting structures, right? Um, there’s way more complex things you can do with it. And there’s still lots of things I look at it and go, okay, “how do I do this pattern, you know, construction and manipulation?” Um, because patterns are very complex. Um, but yeah, just at the outset, it’s a much more approachable language. I feel like and has some really cool fun things to do with it. And I always like to give the example that it took me learning Cypher in order to understand what the SQL having and group by clause was trying to do. Um, it was just way more apparent in Cypher than in SQL. Chris Engelbert: I agree. And I think, and that is where a graph database comes in in general, as I said earlier, in SQL, when you have those like multi hop relationships, you end up doing something like this weird recursive SQL. It works, but it’s never going to be nice. It’s a recursive, common table expression, with the union and a join and I have to look it up every single time I have. I’ve used it so many times. I always get like 95% to where I want to be. And then it just doesn’t work the way I expected. And I have to look it up and I probably made some mistake on the join type or on the joint clause. And with Neo4j or in general with graph database and specifically Cypher, it is so much easier to model that stuff, even when you use a merge or something, it’s still way easier. Jennifer Reif: And for those of you who are not familiar with Cypher or thinking that this is a Neo4j thing. Um, first of all, we have OpenCypher, which is a completely open source. We open sourced it, I believe back in 2015, but just this year, Neo4j and several other graph database vendors all got together and came up with the ISO GQL standard, “Geequel standard”, that was released, I think like a month, month and a half ago now. And so there is an official Graph query language standard now that Cypher has poured a lot into that as well. Um, there’s a lot of things that have, have come over from Cypher as well as some other graph query languages too. So it will be an official, like unified standard. Of course, whenever, when everybody can kind of get to that. Chris Engelbert: An ISO standard. Jennifer Reif: Yep. Chris Engelbert: Wow. I did not expect that to see in my lifetime. That is incredible. Jennifer Reif: It’s been several years in the making. And Neo4j and all the other graph database vendors have been hard at work getting that all together, but yeah, it all got approved and everything. Just recently. Chris Engelbert: So how does it work from a programming language perspective? Um, I know that Neo4j has a lot of drivers, obviously it’s not a SQL interface, so you need something different than for example, in Java JDBC or in Go, the scan interface. But I think there’s drivers for almost every language I’ve ever considered. Jennifer Reif: Yeah. We provide official drivers for like the bulk of your core languages, and then there’s community drivers that are very well supported, very well maintained by partners or communities or so on for several other languages, and then we also do have like a JDBC driver and other things too, as well as integrations to major frameworks. So like our Spring Data in Neo4j integration has been around forever. Um, and several others as well. And of course, you know, we have like the big GenAI ones now, your Langchains, your Llama index, and so on too. So, basically anything you want to integrate with or around Neo4j has some kind of connector integration or driver or something to do with it. Chris Engelbert: All right. Cool. You already mentioned Neo4j Aura. And as far as I know, we’re a cloud podcast, but we’re also Kubernetes podcast. As far as I know, Neo4j Aura internally uses Kubernetes, right? Jennifer Reif: Yes. As far as I know. Yep. Kubernetes is the thing. Chris Engelbert: Okay. So we’re probably on the same level of understanding. Jennifer Reif: Yeah, there may be some other things they do as well, but yes, we run Kubernetes and we have a very good integration and partnership there. Chris Engelbert: Okay. So that means I can also use Neo4j on Kubernetes outside of Aura. Jennifer Reif: Yeah. The thing that, at least I didn’t realize and still like, until I started digging in just a little bit, is running a database on Kubernetes is not as simple as spin up X database. Um, there’s a lot of, you know, because… Chris Engelbert: If you don’t care for persistence, yes. Jennifer Reif: Right. Kubernetes is very customized because typically you’re dealing with enterprise systems and you need to mess or customize with individual components or pieces. So running Neo4j requires about four or five different components that technically run or would run separately on Kubernetes. And so, if you’ve ever heard of Helm and Helm Charts, that’s the easiest way to basically just outline, you know, these are the services, the pieces that I need in order to run Neo4j, spin all these up together and manage them this way and replicate them this way. Um, and so it’s actually pretty easy to get up and running with the Neo4j provided managed supported Helm chart. Chris Engelbert: Interesting. So the reason I’m saying interesting is because everyone these days talks about Kubernetes Operators and “we have the Operator to set it up for you” and you say “no, use the Helm chart.” And it’s like, it’s so refreshing. I haven’t heard that in a while. I think the reason is that Operators give you a lot more like operational… Well, you can react at runtime to certain situations where the Helm chart is basically just the installation. I think that is the reason why a lot of people use or move towards the Operator. Um, but that’s just my guess. Um, maybe it’s just like cool to have an Operator these days. Jennifer Reif: The latest thing. Chris Engelbert: Yeah. So let me see. We talked about developers, we talked about the programming languages, we know you can run it on Kubernetes. Um, make sure you have a persistent volume if you run a database. We talked about that. Jennifer Reif: Yep. Chris Engelbert: And if you need a persistent volume provider, I heard that simplyblock might have something for you. Um, but there’s a lot of others as well. Actually just yesterday, or on the weekend, I started a small website where you can look for all the different CSI providers. Basically the volume providers that can be plugged into Kubernetes everything that I know and found, and I split them by features and you can search. So, if you’re in the search for a CSI provider, storageclass.info is probably what you want to look into. If you find something that is wrong, feel free to send a pull request. It’s GitHub pages. Just like as a side note. Okay, because we’re pretty much out of time. What do you think is the next big thing in cloud, in graph database, in databases in general, in AI, feel free to name two or three things as well. Jennifer Reif: Yeah. Well, I think, you know, AI is kind of or it’s kind of the big thing right now, but I think we’ll start seeing that not necessarily taper off, but we’ll start seeing that integrate into, kind of just our standard day to day, rather than that, I think being the focus for everything. Um, I think we’ll kind of see, you know, us not go back to, but kind of modify what was our workflow to integrate LLMs and GenAI stuff into our day to day things. Um, and so it will become just a piece of the deployment puzzle or, you know, building a puzzle or application puzzle, whatever it is. And so I think that will kind of get standardized a little bit better. We’ll kind of figure out where the super useful applications are and the highly critical and impactful workflows that we need to use it. Um, and so I think databases are going to be a huge component of that. Whether it’s, you know, graph or something else entirely, we’re seeing this shift from, “okay, use LLM for everything,” realizing that LLM has some limitations, right. And some, and some weaknesses, but I think those are weaknesses and limitations that databases can really help mitigate. They’re not going to completely solve them, but they can help mitigate that. Because we have lots of good data in our data structures already. Um, and so pairing the two, I think together, this is where you see that retrieval, augmented generation or RAG concept pairing the database with an LLM I think is going to continue to improve that story together. Chris Engelbert: True. You said how to use it best or where to use it. The, I mean, right now there’s this meme going around, like, “I want my LLM to do my dishes and, I don’t know whatever.” Well, so it was, it was differently. “I don’t want my AI to do art and whatever. I wanted to do it, but dishes, so I can do the art.” Jennifer Reif: Yeah. I want to mitigate the low or delegate the low impact things to the LLM. Chris Engelbert: Exactly. I can’t remember exactly what it was right now. Um, but if I find it, I’ll put it in a show notes. Um, I read that and I was like, yes, that is exactly it. Why do we give the complicated tasks or the stuff that we love to do to an AI instead of trying to offload the stuff we really don’t like? A good example of that would probably be writing the initial documentation for stuff. Um, looking at the source code, at the comments and coming up with an initial draft for the documentation of that, whatever. Um, I mean, where most of us are engineers and engineers love one thing, which is writing code, but they hate the other thing, which is, well, love hate the other thing, which is documentation. So maybe, maybe that is something where we should look into and figure out if maybe it helps us that way. All right. Um, cool. Yeah. Um, that was a pleasure. Thank you very much for being here. Jennifer Reif: Thank you so much for having me. Chris Engelbert: My pleasure. Yes. And for the audience, Jennifer prepared a demo which unfortunately doesn’t work for an audio podcast, but we’ll put it in the show notes. It will show you exactly like how you can set up a Neo4j on Kubernetes yourself. Um, and we may actually do a recording. Um, so I can put that as well. We’ll see. Maybe not yet. Maybe it’s somewhere in the near future. Like a plan. I know, I know. Sometimes I have plans, not a lot of times, but sometimes. Jennifer Reif: Whether they actually get implemented, you know, who knows. Chris Engelbert: Exactly. You can always have good ideas. And there’s plenty of those, not all of them are getting implemented. All right. Yeah. As I said, thank you very much. Uh, it was a pleasure. Uh, it was good to talk to you after two years, three years again. Yeah. Time just flies. Jennifer Reif: Um, hopefully we’ll connect in person at a conference sometime in the future again. Chris Engelbert: I hope so. I hope so. I mean there is a lot of database conferences, a lot of Java conferences, so there’s a good chance, I guess. All right. And for the audience, thank you very much for being here again. Uh, see you all next week. Uh, we’ll be next episode and the next guest. Thank you very much for being here. Thanks.