Intro To Neo4j & GraphQL
Presentation to the Systems Development and Frameworks class at HTW University in Berlin
This is a presentation given to the Systems Development and Frameworks class at HTW University in Berlin covering an introduction to Neo4j and a look at using GraphQL with Neo4j.
Links And Resources#
And yeah great so welcome to our next lecture in the core systems development frameworks. Today we have a special guest uh. His name is william lyon uh earlier i had the difficulties to pronounce this last time correctly, but i think i'm getting better at it and he is with us a second time and i'm really excited and glad that you're here and tell us a little bit about Neo4J grafter davis and maybe later on, if you drink afk.ids, so welcome will thanks great hi everyone um. So i shared a link in the chat to the slides um. So hopefully you can can access those um. What's the best way to take questions just uh just in, and i can see the chat so folks ask in the chat i can see that or if you want to just interrupt me, that's that's probably fine too so yeah. So this is. This is kind of the the agenda um i'd like to talk a bit about neo4j, um, sort of an intro to graph databases and then also talk about how we can use graphql with neo4j, so specifically how we can build graphql apis that leverage neo4j as the Backing data layer and we'll kind of go through a few different approaches. Sort of a naive approach for how we might use the client drivers for neo4j to to build a graphql api ourselves and then look at some of the problems that come up when we go through that sort of more naive approach and some of the tooling that Exists that makes it um easier, more performant, uh tooling, that addresses some of those problems that come up um. So we'll look at a library called new photograph killjs and a couple of other tools around that yeah. Maybe i can quickly interrupt just to give you a quick background, so we haven't covered databases at all right, so we are at the graphql level and we are in the next homework. We will going to add persistency to graphql. We basically um yeah. We didn't cover it at all, so you can also give us maybe a quick introduction. What neo4j is and what's the nice thing about, um a graph database, etc, yeah yeah, that's perfect great, and what what what's the time frame by the way? How long should i uh yammer on for um? You have 20 minutes, um yeah, just just yeah and i don't know um students are free to also to leave. If you, if you need more, i guess i don't know. I hope people are fine with it. Okay, cool great uh yeah, so a little bit about me, um, like robert, said i work at this database company called neo4j, which is an open source graph database which we'll talk about. I work on a team that we call neo4j labs, which our goal is to build extensions, uh plugins integrations with neo4j and and other tooling, so i don't work on the the core database itself. Instead, i work on tooling and integrations around the database. I'Ve also been writing a book called full stack, graphql applications with grand stack, so that's graphql react apollo and uft database um that link in the top corner. There is a link to download three chapters for free from the book sponsored by neo4j, which focuses on the backend aspect, so maybe a deeper dive of what we're going to talk about today. So what is what is neo4j is, i think, a good place to start here. So fundamentally, neo4j is a database management system dbms or commonly just called a database. So if, if you've heard of tools like postgres or mongodb, neo4j neo4j is in this similar ecosystem of database tools, dbmss, where the sort of ultimate goal of a database is to allow you to store and query your data right. So some persistence, durable persistence, layer for your application and there there are lots of different ways to address that persistence problem and there are also lots of other aspects that come up when you're choosing a database for your application. So things like the performance characteristics for different access patterns, is it really really fast to write? Can i shard the database in a cluster? Can i have multiple instances of the database sort of deployed to scale the the throughput and the availability of the database? So if one database instance goes down, i have other resilient instances of the database in my cluster so anyway, these are some some of the considerations to think about the things that i think are most interesting. Initially, when you start working with neo4j are things like the data model and the query language are a bit different with graph databases like neo4j from from other database tooling. So we'll talk a bit about those, but that's: okay, if you're not familiar with other databases, we'll introduce these concepts on their own, so fundamentally as a graph database. The data model that we work with so the way that we think about query and store our data with neo4j is as a graph. So in a graph nodes are the entities and they're connected by relationships, and then we can store arbitrary properties, which are the attributes that describe the nodes and relationships um and we'll look at a deeper example of this in a minute. But that's the the general idea. We call this the property graph data model because we can store properties on the graph now to work with our data in neo4j. We use a query language called cipher. There'S an example here in the the upper right corner and with cypher. So if you're, if you've heard of relational databases and sql, you can think of cipher as sort of the equivalent as sql or sql, but for graph databases like neo4j and the the important fundamental aspect when working with cipher is this concept of graph pattern matching? So with cypher we declaratively draw essentially ascii art notation to represent graph patterns, so you can see here in this example we're saying match. So that's basically like find find this pattern in the database and then the thing that follows that match is the pattern that we're looking for and so starting off. We'Re saying, find address nodes, so nodes are represented in parentheses. Sort of like drawing a node is the idea there, and then we have this a colon address. So a becomes a variable that we can then use to refer to the node that matches this piece of the pattern later on and then colon address. So the thing after the colon is called the label - that's just a way to to group or describe the nodes, we're saying find address nodes that have now this connection this registered address. This is drawn with these square brackets and this sort of arrow ascii art notation - that represents a relationship. So a relationship coming into the address, that's connected to an officer node within an outgoing relationship to an entity, and here we've left off the type of the relationship. So you can see we can construct these sort of arbitrarily complex patterns, so this is saying find addresses that are connected to an officer node through this registered address relationship and then find all of the entities connected to that officer and then we're filtering where the address Contains new york, so this is saying, find all of these entity nodes that are connected to officers with an address in new york um. This this query comes from the panama papers, data set, which was a large-scale data journalism investigation that was looking at offshore legal entities connected to banks and various other people throughout the world, and the group that was investigating that data set used neo4j to model these connections Of offshore legal entities people connected to them and where they're located throughout the world, because it's it's a very complex structure, so that that's one of the things that graph databases are really good for, is modeling very complex and connected data. So, if you think of like offshore legal entities where you might have a business, that's registered in the cayman islands that owns another business, that's registered in panama that owns another business and then that business owns a plane. You want to be able to sort of traverse. This graph to find who's the person who owns the company that owns the company that owns the company that owns the plane right so um anyway. That'S one example of where graphs uh come in very, very handy for representing this very complex uh and connected data set. So once you have your data in in neo4j, there are lots of interesting things that we can do with it. So we can build applications on top of neo4j like web applications, which is kind of what we'll talk about today. But there are also lots of interesting use cases for things like graph analytics, so we can run graph algorithms, like like page rank or community detection, to find clusters or to find the most important nodes in our network for things like recommendations or for search engine optimization. These these kinds of things, and then there are use cases for graph visualization tools. So i may want to be able to interpret the results of these algorithms or i may want to explore the graph using some of these visualization tools. Neo4J is available to download locally or there's a hosted solution in the cloud you can deploy it to like aws, amazon or google cloud platform, that sort of thing as well and then, which we'll talk about a bit later, there's a graphql integration as well. That makes it more straightforward to build apis that sit between our client application and the database, so you can think of uh, neo4j and sort of the the tooling and use cases around it as a graph platform. So the hear the database is sort of sitting at the center of this platform and there's different, tooling and use cases depending on what you're trying to accomplish right. So, in our case, we're probably most interested in this upper left quadrant, where we're building maybe web applications that are talking to some api layer that we're building. But it's important to also keep in mind that there are other use cases. I mentioned the graph analytics aspect so more of the data analysts data scientists use case where they're maybe not necessarily interested in building a web application, but they're interested in running graph, algorithms, maybe part of a machine learning pipeline. These sorts of things so different use cases in different audiences with different tooling, but the core neo4j property graph model working with cypher. That sort of remains the same. So we talked a bit about this idea of the property graph model. Here'S here's an example! So here we have uh three nodes and the the caption of the node here so employee company city, that is called the label of the node and the label, is a way to to group nodes. It sort of describes what type of node it is. Nodes can have one or more label here here we just have one, and then we can store arbitrary, key value pair properties on the node. So here we have a property with the key name and the value amy peters i mean we can store things like strings integers. We can store date time representations. We can store geospatial point types, these sorts of things now relationships similar to labels. They have what are called a type relationships, have a single relationship type so, whereas nodes can have multiple labels, relationships have a single type and a direction. So you can see there's a direction connecting the company is located in the city. So one way to think of the property graph model is sort of describing a sentence where the nodes are nouns. The relationships connecting our nodes are verbs and then our properties are like adjectives or adverbs that describe the nodes and relationships. Here'S another example. Maybe this one is a bit more concrete, so um here we have uh, maybe like the data model for uh an online store, an e-commerce store, or something like that. So we have an order in the middle here. It has an order id and then we can see the customer who placed the order. His name is bob, and then we have an employee who fulfilled the order. So here's an example where we have multiple labels on the nodes and this is sort of a way to describe a hierarchy. Almost so bob and emilia are both person nodes, so they both have the person label, but we have a more specific way to describe bob, and that is as a customer and emilia is an employee, so that additional label gives us a greater level of specificity uh. When we're, when we're thinking about our data model uh, so i see a question uh from jennifer in the chat. Is there a rule or convention that the relationships need to be in capital, letters yeah, so that that's purely a convention um? I think there is a cipher style guide that describes this convention yeah. Here we go i'll drop. This link in the chat so yeah. The the convention is for node labels to use this uh camel case or pascal case um, in relationship types to use all uppercase, and then the style guide also goes into how to how to think of keywords and working with property names and that sort of thing. So it's fairly opinionated, but those are just conventions, not rules, and it's it's helpful. I think to have these sorts of conventions uh as some sort of standard, so that when you go from maybe looking at the way one user defines this data model and writes their cipher query and you go and look at another you. You can very quickly make that uh sort of mental model uh just looking at the data model, and you see something in all caps that jumps out yes relationship type and that sort of thing. So that's the benefit of that convention. I guess yeah good question yeah and then um as jennifer says, if, if she tries to return something match in all lower case, works with just the same as as match in uppercase and yeah. So there's there's just the the convention there of using all uppercase. It'S not um, it's not enforced at all by the database, yeah good question, so we're looking at our example here of our store ecommerce property graph model. We talked about this concept of multiple labels on a node um. We have similar concept here for our products, so we have two products that were in this order: um. What did we buy? Some spaghetti, noodles and a book? So the other interesting thing here to point out. I guess is note that we can store properties on the nodes or the relationships, and so the question comes up. Well, when should i store a property on a node versus a relationship, and the answer is if, if the property is describing just the entity, just the the node, so like example, here is the the sku or like the the id for the product and the description Of the product, well, that doesn't have anything to do with the order, so that clearly belongs as a property on the node. But then here we have the number of this product that we've placed in the order. So the order contains two. You know packages whatever of spaghetti noodles and that information the number of products in the order. Well, that's not describing just the order or just the product, it's describing the product in the context of the order right, so that property is describing both the order and product. So that means it belongs as a property on the relationship similar. Similarly, for the date that this order was fulfilled by uh by this employee, there could have been a another employee that was also working to maybe fulfill the order on a different date, a date specific to both the order and the employee. So it belongs on the relationship this link down at the bottom. This is a link to a simple graph, diagramming tool called arrows that that just runs in the browser and allows us to draw these property graph models. So this is one that i was working with previously designing the data model for a podcast application, but this just allows us to sort of allows us to sort of draw these models and think about them. Um one thing that that is nice is, i can export the markup and then i can take this code that describes the data model and then share that with my collaborators. I can check it into to version control like on on git or github, something like that and actually version the data model, which is which is quite nice cool. So if we think of that that data model that we've described with customers and employees and orders and products in the order, how do i then work with that using cipher? Well, here's an example this this query. This pattern that we've built up is a bit more complex than that first example that we were looking at with the panama papers, data um. We'Re we're now saying find this pattern where a customer placed an order and that order contains a product and then find other orders that have that same product where the customer's name is bob. So this is saying: okay find find bob find all the orders that bob has placed now what products are in that those orders. So what what products is bob ordering and then find other orders that have the same products and then we have another match. So we traverse out again so from these orders that have the same products as what bob is ordering, what products do they contain and we said we're binding this product to the variable rec rec and we're then filtering with another predicate. This is where this recommended product is not um the original product, so basically, where bob has not placed an order for this product and we're going to return rec. So this is a product recommendation query. This is very similar if you're on online shopping, something like like amazon and you're looking at products, and you see these recommendations that say well. People who bought this product also bought these other products. This is exactly the type of query that's running on the back end. To generate those sort of recommendations, so you can see how it's relatively straightforward: to define these sort of traversals through the graph starting at our customer tracing out to their orders. Products orders that contain those products and then what other products are people buying that are buying the same thing that bob is buying right. This idea of collaborative filtering, i guess, is the terminology to describe this sort of product recommendation traversal, so this sort of personalization. This is a really great example use case where graph databases are commonly used because they're optimized for for a couple of things, one they're optimized for expressing the sort of traversal so expressing this in cypher, we draw this sort of ascii art notation, it's fairly straightforward. To reason about how we're traversing through the graph, but then in the database itself, the database is optimized for these sorts of traversals, so every time that the database is traversing from a node to another node that it's connected to through a relationship. That'S a very fast operation in a graph database like neo4j, because it's essentially uh has every record essentially has a pointer uh to an offset on disk is how it's implemented, but it's basically just chasing a pointer uh to exactly where in the file store. That node is represented, whereas other databases they may have to do what's called an index operation where they're they're taking two large sets of data, seeing who they overlap and that's sort of the equivalent of this traversal often called a join operation. So building a database is all about optimizing for different patterns and use cases, and the optimization that graph databases make are on doing these sorts of traversals very, very efficiently, which is different from the optimizations that other databases make anyway. So that's that's just an example of the type of query we might see in uh in a e-commerce application for personalization. So let's talk a little bit about some of the tooling that we might use with neo4j. So the neo4j browser is one of the first tools that you use to interact with neo4j. Typically, you can think of it as a query workbench for neo4j, so it allows us to send cipher queries to the database and then visualize the results. It'S also a nice. Getting started and educational tool there's ways that we can embed uh, guided sort of exercises, and things like that that allow us to uh to embed queries and text, and things like that, and this is sort of the the default tool that we initially use to to Work with neo4j, so let's take a look at an example um, so i'm going to switch to neofj browser if you want to. If you want to try this on your own there's a couple of ways to to get started, one you can download neo4j. The default download is called neofj desktop, which we'll take a look at a second or there's a online sandbox environment as well. That allows you to spin up databases pre-loaded with data sets, but i'm going to switch to neo4j desktop um, while i'm i'm switching into this are. Are there any any questions so far on what we've talked about yeah, maybe um? What i really like, when um, when talking about graph databases, is most of the students probably have experience with sql table-based databases and mongodb, which is a document-oriented database, and i really like to have this mental model in mind. That neo4j is a graph database. It takes the graph as the central data structure and the other databases are usually also exposing implementation details, for example, foreign keys, uh to the end user, the developer and that's a fundamental difference to neo4j, which is hiding this implementation detail, and you can all focus on The structure of your data right so also things like a dangling pointer uh, which is a foreign keynote and sql database, pointing to nowhere. This is not possible in neo4j right and i think that's that's one of the key fundamental uh differences uh compared to the databases that most people get in touch with uh during their and, like i don't know, um yeah during university or something yeah that that's a Good point so so, when we're talking about doing these sorts of traversals from like customer placed order, if you're doing that in a relational or sql database instead of declaratively writing that pattern. You'Re writing like a sql, join statement and and trying to figure out on what foreign key you're joining on that's overlapping from the customer in the order right. So it's um! It'S in my mind. I think a bit more straightforward to represent that in a graph database. Yeah and also um when i was at university, i also had like a little bit of graph theory and graph algorithms and the the these these people there they are mathematicians. So they don't even worry about implementations and they can develop algorithms for abstract graphs right and they have this concept of first of all, a graph then a directed graph and then a pretty graph. A labeled graph on all of these things and neo4j is implementing that. So i i'm not 100 sure if graph theory or or anything like that is a thing at htw, but these algorithms developed in this abstract mathematical world are applicable at graph databases right and that's also one thing to like a nice connection to keep in mind. I would say yeah and because you mentioned um graph, algorithms um there's a very, very interesting project called the graph data science library for neo4j, which implements a lot of these algorithms in what are called procedures, but essentially ways that you can can use them in cypher. So things like uh like pagerank, these are centrality algorithms, things like similarity, algorithms, community detection, so we won't really dive into too much of this today. But i just want to mention uh, if you're interested in graph algorithms, that this is a great library to be able to work with these sorts of algorithms um in neo4j there's also a what's called a pregl api, so you can also implement your own custom algorithms. In neo4j uh using this pregl predator api so anyway yeah lots of lots of interesting use cases for graph databases for sure and there's one question in the chat. Ah so would you say that graph databases replace table-based databases or any downside of graph databases? Yeah? That'S a really good question um you can fundamentally, i guess you can think of graph databases and relational databases as or document databases as interchangeable for the i guess, more common use cases right. So if i'm building um what is commonly called a crud application. So if i want to build a web application that has simple, create, read, update, delete functionality um, so if i'm building like a a to do app is, is a very common um example where i want to make a list of like all the things that i Have to do, and then i want to like edit those i want to check them off right if i'm building something like that, then sure i can do that in a graph database. I can do that in a relational database. I can do it in a relational database with sql they're sort of interchangeable there and there probably isn't really much advantage to using one type of database over the other. In that case, however, when i have more complex applications or when i have to think about um the scale of my application, if i want to be able to support, you know millions and millions and and tens of thousands of concurrent users, if my data model is Very complex, if i have complex authorization rules, then the different types of databases, the optimizations and the trade-offs in the implementation of those databases. Then they start to make a difference. So i mentioned earlier that graph databases are optimized for these sort of local traversals, where i'm going from one node to a connected node to another and in our our product recommendation, query uh. We have one two three four sort of hops that we're doing in our traversal, but you could easily imagine a complex query. That is maybe 10 12 hops out. And if you have a complex access pattern like that, where in this case we're generating product recommendations or personalization, but it could be other things like maybe routing where i want to find the most efficient path from like the grocery store to you know the brewery or Whatever that can be much more complex and a graph database is going to be optimized for that access pattern, we're doing that in a relational database is going to be one much harder to express in sql by writing. These joins, like robert said thinking about foreign keys and where those overlap, but also the performance aspect in a relational database, is going to start to break down because each one of those join operations is using, what's called an index. So it's basically taking where two tables putting them together and using an index to see where those tables overlap and, as i start to have very, very large tables and, as i start to have lots of those join operations in a single query. That starts to get really really slow. So in the graph database, the performance of a traversal uh is basically a constant time: operation, um computer science. You have this concept of of big o, which is basically sort of like um. How does the performance of the operation scale as the size of the data uh scales um? So it's a constant time operation for a traversal in a graph database for a join operation in a relational database, it's dependent on the overall size of the data. So, as i add more and more rows to my tables, joint operations start to slow down. So that's one area where a graph database is going to give us better performance than a relational database, but the original question you're asking okay: well what are sort of the downsides of a graph database - and you know every every database - is sort of this uh. This choice in trade-offs and the trade-off that we're making, perhaps with a graph database, is that we're materializing these relationships at right time. So in a relational database. When i make a write operation, i'm adding an entry in a table adding another row, i'm not storing that relationship in the database at right time, whereas in a graph database i am so. Every relationship is a first class citizen in the database and at right time. I'M persisting that relationship, so in a graph database, you may see a different performance with regards to write operations than you might expect with a relational database. So that's that's, perhaps one downside or or way to think about the access patterns of a graph database differently. Another is when it comes to sharding our data, so if we, if we want to be able to split up our data, set into what are called multiple shards uh. So basically if we have lots and lots of data. So if we have multiple terabytes of data that we can't store on a single machine, we want to be able to shard that and spread it out to multiple machines. And then, when our query comes in to our database cluster, we know which database instance to route that to based on what piece of data we're querying. Well, a graph is a very connected structure and so it's more difficult to shard the graph and to figure out which piece of the graph goes on, which machine? Because when a query comes in, i don't want to have to incur the network latency to go to a different database instance in my cluster to access a different shard of the graph. You can create shards in neo4j, that's possible, but you have to think about the performance implications, then of a query. That'S going to have to make a network request from one machine to another to do this traversal absolutely so. Yes, there are trade-offs and optimizations for different databases, but it really depends on you know what type of application you're building and and what your access patterns are yeah. Maybe i would really like to add that it really depends on your use case. So, for example, if you have a use case where all you're interested in is crunching numbers right, you have a table and you all that you're interested in is like building the average on one column or the sum of one column or something. Obviously, that's where table-based database is better, but i was, for example, working on a project with which was a social network and we are not crunching numbers. We were interested into the structure right, we had people, we had users, they wrote posts and they were friends with each other and they're following each other and obviously we're interested in the structure and that's where a graph database really shines. I would say so. It all depends on the use case. I don't think that graphic devices will replace table-based databases. Table-Based, databases have their use case, so do document oriented databases, it's just that, maybe maybe in the last 20 or 30 years or so it was easier to implement table-based databases which were performant and now we have more options and we can choose the best database. For my particular use case - and we should stop thinking in this box - that okay, the database is a relational sql database or something it's like i can have. I have a plethora of databases which i, as a developer, can choose for my particular use case right and i think for me, it's really strange that that uh gravitations are still considered like exotic or something which no actually. For me, they are not right because, most of the time or very often our use case is about the structure of data or something yeah most of it yeah. That would be a great use case, usually for a graph database. Yeah, there's also another question so marcel says: isn't every node kind of like a table as well? So i think um. If we go back to maybe this diagram um or maybe this one which doesn't have multiple labels, that sort of muddies the water but yeah um. So the way i think about it is, i can think, of a node label so like the employee label, where i'm going to have multiple nodes with the label employee, so one node to represent each employee of of a company a way to think of comparing the Label to the table is that yeah, like the employee label, is kind of like a table called employee in a relational database, so a way to group nodes right. So i, in this case representing this in a relational database, i might have an employee table a company table a city table and then for the relationships. Well, i might have a join table to join employee on company that has additional attributes that is here. Storing the start date, so yeah think of think of labels, as essentially a table or a way to group nodes is a good way to think of it yeah. So you can represent um cables as grass, of course, and i think we are now half way through. Like it's 45 minutes and yeah, let's maybe continue on, i would say yeah, so i'm going to i'm going to jump over to neo4j desktop, and i have some databases loaded here: let's create a new database just so we can see what this looks like. So this is, this is what the the sort of default uh download of neo4j desktop will look like you can create, you can create projects, each projects can have multiple databases and then, when i create a new one, let's call this my database. I need to give it a password and i can either in in nifty desktop. I can either create local databases which are running on my machine or i can create remote databases which allow me to connect to databases that i may be hosting in the cloud somewhere. So like on neo4j's cloud service or neo4j sandbox, or if i've deployed a database somewhere else and then i'm going to open neo4j browser which will allow me to write some queries and start working with neo4j, let's get that get hung up. Let'S try that again there we go, make this a little bigger, okay, so the first thing i'm going to do so this database is empty um. If i do something like match in return count in this query, this is saying just match on every node. So that's what the parentheses mean i i could specify like a label here um, but a pattern. A graph pattern can be sort of as specific as i want it to be. So here the n is a variable. That'S going to be bound to every or any node in the database and returning the result of an aggregation function. Count that's going to give me the number of nodes, there's zero. So let's add some data, so i mentioned that we have these browser guides. So i can access those by running this colon play command and this this pulls in a browser guide that then embeds some queries and texts, sort of like educational things - and i can i can click on these, so this one that i loaded colon play grand stack. This is just a a sample data set that i like to use that has data on businesses and reviews, so here we're loading in some data from a csv file, and we haven't talked about how we create data in the database. But we do it in a very similar way to how we match on data, where we sort of describe the pattern that we want to create. So here, instead of match that we saw before we're saying, merge. Merge is a right operation that avoids creating duplicates. So we could say this could also say, create and then we're going to create these business nodes uh one business node for each row in this csv file. But we don't want to create duplicates and we have a business id field here so we'll say, merge to avoid creating any duplicates with merge. Other databases, use the terminology of upsert or create uh, create horse or get essentially like does this thing exist, if not create it. If it does exist, then just match on it essentially anyway, so we go through that process. Creating businesses, users reviews then connecting them together. So if i run this query, it says i've added 36 nodes set, some properties created some relationships. Now i can say, call db, schema visualization, and this will then inspect the data and give me this graph view of the data model. So these are the node labels and relationship types that i have in the database. So i have users that have users that wrote a review that review reviews a business and businesses are in categories. So now, if i write a query, let's find a user and look them up by name, so here we're matching on a user node and then we're adding in curly braces specific properties that we want to match on as well. So now that becomes part of our graph pattern, so we're saying find a user where the user's name is. Will this is a sort of equivalent to this representation as well? Okay, so here's the will node and if i double click on these, i can traverse out and find okay, here's the reviews that will wrote. I traverse out those reviews are connected to businesses, um, here's a review of kettle house, which is a brewery. So it's in category beer and brewery. If i keep traversing, i can find other businesses and reviews and users that have reviewed those businesses. Um jonas says, is bdot name, something like a key and therefore colored differently, yeah, so um in this case. U is a variable, so we have some syntax highlighting here in this cipher query editor, where the syntax highlighting is trying to give us some some like semantic information with the colors. So, u is a variable that refers to any nodes that match this pattern and so we're coloring you a bit differently than the like cipher uh statement commands uh yeah, you say in the initialization script and yeah. We should see that similar case in the initialization script. Um, where b after we create this business node, so in this case, we're iterating through this csv spreadsheet and that's what this load csv with headers line means is grab this spreadsheet, the csv file and iterate through each row and then for each row, create a business Node and the b here, that's now a variable you can see here, though, i think this is what your uh, what your original question is: uh, so, b dot name is colored differently from v dot, city, uh and yeah. I think that's the simplistic approach of the cipher editor here, not understanding uh name, i think, is a cipher keyword and it is coloring it differently because it thinks it's a cipher keyword and not a property yeah. So, that's that i think, is just a bug in the cipher editor, a very a very keen eye. If you notice that cool. So we saw here's sort of how we can write a query and then like double click, to traverse out and explore the data. But you know what, if we want to write a query um that expresses this traversal, we don't want to just like be double clicking here. Um, we can write more complex queries. I think i have a few saved here. Um. Oh that's for a different one. Let'S switch, let's switch to the browser here, so you'll notice that we have these saved queries in neo4j browser. So if you, if you write a query and then click this favorite button, it'll save it in local storage, but that's only saving it in that browser instance. For neutral browser so like, if i go here to nifty desktop, i have different queries saved here. This is for a different data set, but anyway, what i want to show is how we can construct these more complex traversals. Based on that initial sort of predicate that we wrote, so we said, find the user named, will now we're just adding more complex traversals to that pattern, where we're finding not just the user, but also the reviews that they wrote and what businesses those reviews are connected To just by building up this graph pattern and you'll notice, as i hover or click on these nodes, i can see the property values down here. So this car is ducky's car wash business node. It has an address business, id, a city, the latitude and longitude and so on, and then here's a more complex query. This is a business recommendation query. This is very similar to our product recommendation: query where we were looking at what uh, what products are people buying? Who buy the same products that i do this is now applied to businesses, so what businesses is this user will reviewing? Well, what are other businesses that users who review those same businesses reviewing um one of the sections looking at categories? What what categories are the businesses that i'm reviewing in what are other businesses in those same categories where i have not reviewed that business so recommend new businesses to me that are in categories that i'm interested in and here's a few businesses that this user might be Interested in that they haven't reviewed yet based on sort of overlapping categories. So a similar concept to the query we saw before, but i just wanted to show how we can use ufj browser to to send these queries to the database and sort of work with the results. You'Ll notice here that we also have a table view and we have both the node that we're returning and a score value here. So the score in this case is the number of paths, so basically the number of like overlapping categories that got me to this recommended node um and we have the graph view as well. So we can return graph objects, so nodes relationships, paths or we can return. Scalars and more tabular data we can mix those, it's probably more helpful if i just return, maybe the the name of the business and then i get sort of tabular data that says. Okay, this is the top recommendation. It has a score of two okay, so that is a quick look at neo4j browser and how we can write, queries visualize the results in most cases, though, we're interested in building an application. So the browser is useful for sort of developments where i'm writing queries in the browser and visualizing. The results typically we're building an application, and we want to query the database from our application code. So this is where the neo4j client drivers come in. There are drivers in lots of different languages, so these are our packages, libraries that i can use in my application code that take care of connecting to the database. Allow me to send cipher queries and then serialize the results. So i can work with the results of my queries in my application code. This is an example of the using the javascript driver. Um drivers, though, have a similar api and similar concepts where i create a driver instance that has the connection credentials for the database. So it knows the how to connect to the database and how to authenticate as a database user and then, when i'm ready to query it, has the concept of session objects. So i instantiate a session from the driver. I have a query. I can send parameters so here this is creating a person node setting the name. I can pass parameters in my application code and then i get back this result set that i can then iterate through. So the different languages that you're using have similar concepts like driver sessions result sets but they're idiomatic to the language that you're working with okay. So that's that's sort of a rough overview of neo4j um and some of the tooling that we use with neo4j the propertygraph model cipher and then just a brief look at the client drivers. What we're going to talk about next is how we can build graphql apis using neo4j. So, let's take a look first of all uh at sort of why we would do this um there's there's some benefits. I think of using graphql with graph databases like neo4j and we'll talk about a few of those. But i think one that stands out is the graph data model, so with graphql we're describing our data model for the application layer as types and how they're connected and that's building up a graph. We think of how we represent that data in a graph database. It'S very similar, so we have this concept of graphs in our database and graphs in our application. Uh logic graphql makes this this observation that your application data is a graph, so i think, there's a benefit there of having a very similar data model, both in the database and in our application data. So when, when graphql was first open sourced several years ago, this was something that at neo4j, we were interested in to see if it made sense um. First of all, i guess to to use graphql with neo4j so to build graphql apis on top of neo4j, but i mentioned that i work on a team that builds integrations with neo4j and other technologies. So we were very interested in seeing if there was some sort of integrations, we could build to make this process of building a graphql api on top of neo4j, more straightforward or if we could optimize any part of that process. So we we started talking to a lot of users, both in the near fj community and in the graphql community at large to see sort of what were some of the the pain points that they're running into to try to find opportunities for some sort of integration. That we could build - and ultimately this this led to now what we call grand stack, which is this combination of technologies that work together for building full stack applications. So that's graphql react apollo and uft database grandstack.io has the documentation for this. The most interesting aspect, i think, though, and what we'll talk about today - is the graphql neo4j integration piece. So how can we build graphql apis that can talk to generate queries to fetch data from neo4j? Can we use the graphql type system to define the data model in the database, these sorts of things? So if we take a look at sort of the, what i'll call the standard approach of building a graphql service and - and i think this is something that uh you've covered in previous lectures - the standard way to do this is to create your graphql type definitions. You can either do this using the graphql schema definition language or by constructing your schema programmatically, but fundamentally you're, basically defining the types in the api. So what are the objects? What fields do they have how they're connected and then the next step is to implement resolvers. So these are the functions that are actually responsible for fetching data from the data layer and that can be a database that can be multiple databases that can be even connecting to other apis. Any any combination of that right, but the resolvers are the functions that have the actual logic for resolving the graphql request. And then we use something like apollo server to basically attach these resolvers to my type definitions to serve my graphql schema so that it can handle incoming network requests, sending a graphql query, execute those resolvers and then send the data back right. This is the general approach for building a graphql service, so we start with graphql type definitions and we'll go into too much detail here, because i, i think, you've you've covered this previously, but uh. These type definitions describe right all of the data available in the api. They specify the entry points for the api on these uh special types called the query and mutation types and again we can do this using the schema definition, language, which is the example we have here, or we can also do this programmatically then come the resolvers, and These functions are attached to our schema, so they map to the types in the fields that we've defined in our graphql schema and inside a resolver. We may do things like initialize, a connection and a query to a database. We might have to think about enforcing some authorization rules. We can do that elsewhere, maybe in middleware or something like that. But we can also do that inside resolvers and then we may have to validate or format the response that comes back from our database and then return the results. Each resolver is passed a few arguments so the first one or the the object argument. This is basically the result of the object that we're currently resolving and if this is the root resolver, so if this is the query field resolver this first argument will be will be undefined because we haven't resolved anything yet then we have the parameters argument. These are the field arguments, so, if we've passed in any arguments to the current field that we're resolving uh, those will be available there. This is common if we uh, if we're, maybe doing something like pagination, and we have a first or offset parameter or if we're doing some filtering something like that, then we have the context object and in the context object. We may have things like a connection to the database uh, so it's common when we're using say like, for example, the neo4j drivers that we looked at earlier to inject a driver instance that has a connection to the database, or maybe we have some other api abstraction Layer in the context object. This is useful because then in our resolver function we can make that connection to the database there and then the fourth argument. This is one that we may not commonly use in uh manual, implementations of our resolver functions. Where we're writing these resolvers by hand, but it's very important for some of the graphql integrations some of the tooling for graphql that maybe generate data fetching code and that's the graphql resolve info object, and this object has information about the graphql schema and the query. The graphql query graphical operation, really that i'm currently resolving okay, and so this example. This is for a graphql api representing conference sessions. So i have sessions that are in a room that have a theme and then i have recommended sessions and with using something like like an orm or some api abstraction layer. My resolvers might look something like this, where i have some object in the context that gives me a connection to the database or wherever i'm resolving this data from the data layer. It doesn't really matter. I just have some abstraction over that and then i'm using this orm-like object to make these queries. So i'm saying find sessions by some search string here grabbing the search string from the parameters passed into the initial query field. Then, when i go to resolve the nested data for in this case my session object, i'm going back to the database and saying okay, here's the session that i'm resolving. I grabbed that from the object, so the session that i've resolved so far and saying hey, find the room for this session. Look it up by session id same thing for the theme and then maybe i have some logic for finding recommended sessions, and i need to go back to the data layer again with the session id to find the recommended sessions. So this is kind of a common pattern where i'm implementing these resolvers and using some orm like object to go back to the database on the left here. This is an example from another api. This is actually from the graphql api that powers the neo4j community site. So community.neofj.com this is our discourse forum for neo4j, but all the data from this top part and these feeds these activity feeds come from a graphql api and it looks something like what's on the left there, where we have a cipher query and we use the neo4j Driver to issue the query and then we have to format the results that come back a little bit, so we can construct database queries directly in our resolvers or if we have some other orm-like object to work with that's a common abstraction as well, and then we Use something like apollo server to basically serve this executable schema to handle network requests, but there's a few problems from this. What we'll say, naive approach? One is this concept of schema duplication, so we have to maintain a schema for the database and for the api, and then we end up with this sort of mapping and translation layer in our resolvers, where we may be going from like a graph model in our Graphql api to a different model, if we're using something like a relational database or something like that on the back end, we end up writing a lot of boilerplate code so basically set up for issuing a query. Writing the query, manipulating the results. So a lot of just sort of standard data, fetching code that can slow us down as developers and then there's this concept of the n plus one query problem. So in this example, where we are making multiple requests to this database object um in each one of our resolvers, that's going to be a round trip to the database, so the overall performance, then of a nested graphql query: that's accessing all of these fields! It'S going to be slowed down as it's having to wait for multiple round trips to the database. That'S called the n plus one query problem. There'S some performance implications there, so those are things that we'd we'd like to address those problems in a graphql integration. So what sort of tooling exists that can address those common problems when we're constructing graphql apis using this sort of this naive approach and that leads to this class of tooling uh, roughly or loosely called graphql engines? So there are a few of these tools and they work in slightly different ways and address slightly different use cases, but they're fundamentally, tools for auto-generating, either your graphical schema and generating database code or or one or the other there. So there's tools for postgres some hosted solutions that aws has and then what we're going to talk about is the neo4j graphql integration. So if we take those sort of high level goals based on some of the problems that we run into with this common sort of naive approach of creating a graphql api, where we have a lot of boilerplates, that's reducing developer productivity, because i have to write a Lot of these things from hand, and it's similar data, fetching code that we could probably generate, but then also i want my my api to be extensible. I want to be able to add custom logic, and i want to make sure that that i'm addressing best practices for performance. I want my queries to be fast and efficient. So those are the high level goals that we sort of set out after looking at how people were building graphql apis in general and also with neo4j. So then, specifically, that leads to a few features that we wanted in this integration and the first one is what's loosely called graphql first development and that's the idea that, when we're starting our application, we start with defining the graphql schema or at least the graphql types. So the graphql type definitions then define the api and they also define the database model. So we don't need to maintain two separate schemas, a schema for the api and a schema for the database. Instead we're driving both of those from our graphql type definitions. We also want to auto-generate these crud operations, so create read, update delete operations for our api, so if we just define the types, so if we say we have movies movies are connected to genres and actors and directors. Well, if we have that, then we're going to want to be able to like create movies, create actors, connect them to connect actors to movies. We want to be able to search and filter for movies by title and year and genre all these sorts of common operations - and we don't want to have to manually, add those operations to our schema. Instead, we want to just auto generate those, so the new j graphql integrations take care of adding the query and mutation types uh with entry points for every type that we've defined in our graphql type definitions, adding arguments for things like ordering, pagination, complex filtering and exposing Things like the date time and the geospatial types in our schema, so here, for example, we have a movies data set and we're filtering for zombie comedy movies, released after 1994 or movies that starred jesse eisenberg and our animation. So you can see how we can construct some complex filtering operations here without really having to write to write much code this by the way, if you want to play around with this data set movies.grandstack.io, that's a hosted, graphql endpoint that you can use to sort of See how some of this works? Now we talked a lot about this problem of boilerplate code in our resolvers and having to write a lot of data fetching logic. It would be really nice if we could just generate this data fetching code from graphql. So that's a big part of the goal of the neo4j graphql integration is to take arbitrary graphql requests and automatically generate cipher queries to resolve those and send those to the database so that as developers, we don't have to think about that for these simple crud operations. But we do want to be able to add custom logic to our schema and we do that using the cipher directive. So, in this case, we basically here are attaching a cipher query that that the developer has written we're attaching that to a field in the schema to define some custom logic. So in this case, this is going back to our business reviews application and we have a recommendation. Query that says if you're interested in this business here are some other businesses. You might be interested in looking at users that have reviewed similar businesses. So then, that query then, basically, is bound to this recommended field, adding some custom logic in our schema with cipher. So there are a few different implementations of the neo4j graphql integrations, there's numerator graphical js, there's a java flavor and then there's a database plugin we're probably most interested in the node.js, the javascript version, so again, just to zoom out at a high level. The point of this library is to help build graphql apis that sit between the client and the database, so we're we're not sort of sending graphql directly to the database. Graphql is is not really a database query language. We still have need for that api layer where we want to implement things, maybe like authorization or caching. So we don't want to send queries directly to the database, but we do want to make it easier to build this api layer between the database and the clients. You can think of sort of two main areas. Uh that neo4j graphql is focused on. One is schema, augmentation, that's where we're taking the graphql type definitions and adding all of the crud operations. Things like filtering ordering and input types for the mutations things like that, and then the other big area is the graphql transpilation, where we're taking a graphql request and generating the database queries in this case cipher to resolve those requests. So i think we have a few minutes left. Let'S take a look at some code here, so we can see how this works. So this is a link to a code sandbox. If you haven't seen code sandbox before it basically allows us to run some javascript code, either node or client code on someone else's containers in the cloud somewhere. So this is just pointed to a github repo. So you can also see the code there on github. But this will be a good example to look at. So first of all, this here are the environment variables for how we can connect to a neo4j instance. This says uk companies - i first set it up for a different project, but uh. This is pointed at a database hosted in the cloud somewhere that has uh business reviews uh so that same data set that we were looking at earlier, and this is just a read-only user. It'S probably not a great idea to check in your database credentials into github, but i wanted to do that for this demo, but this is just a read-only user. So don't worry about being able to make any changes to the database. You won't break anything here. We have a graphql schema, so here we have some type definitions, describing users, reviews, businesses and so on. We have this recommended field that has the cipher directive on it, with our custom cipher query, but other than that we don't have anything any sort of like custom logic here. One thing i haven't talked about is this: at relation directive because the property graph model encodes the relationship type and the direction of a relationship. We need to be able to model that in our graphql types as well. So that's one additional thing: we want to add to be able to represent our property graph if we look at index.js here. So what are we doing? Well, we're pulling in this make augmented schema function from neo4j, graphical js and we're then pulling in apollo server the neo4j javascript driver and then a couple of utility packages as well we're reading in our environment variables from that env file. We just looked at and then we're reading the type definitions from our schema.graphql file that we're just looking at and we pass those to make augmented schema, and that gives us a executable schema object that we can then pass to apollo server. The other thing we're doing here with apollo server is injecting our driver instance. So here we create the driver instance using the environment variables to connect to our neo4j database, but injecting that into the context. That means that this driver instance is now available in every resolver. In our graphql server and then we serve that with apollo server, but there are no resolvers. Here we haven't written any resolvers, typically to generate a executable graphql schema. We have to take typedefs and resolvers and combine those together, but that's what make augmented schema make augmented schema from ufj graphql js is doing it's generating those resolvers for us, so we don't have to write those if we go to our graphql api. Let'S pull this out into a window, so we can see that a bit better. If we look at the docs tab, we can see, we have query fields that are generated for each of the types that we defined with with arguments for filtering ordering. That sort of thing and then also mutations for creating updating and deleting nodes and relationships. So if we run a graphql query, let's simplify that a little bit here, we're searching for businesses, name address and then what category they're in i get back uh some data. If i switch back to my code sandbox and if i look at the logs here, this can be a little hard to read code sandbox, but it's logging. This generated cipher query. So this query is the exact query that we need to resolve this data from the database so matching on businesses, finding name and address traversing out to find what category they're in and then, as i add, more complex fields in the selection set here. So now i'm adding okay find recommended businesses for each one. As i go back and now look at the logs again, i can see here the query that i added as a cipher schema directive is now embedded in the generated cipher query as a sort of sub query. So the benefit here is one i didn't have to write any resolvers to get this graphql api. I just define my type definitions and neofj graphql js takes care of generating both the schema, so the entry points for the api and the resolvers, but then also at query time, i'm generating a single database query for any arbitrary graphql request. So i don't have this in plus one query problem where i'm making multiple round trips to the database. Instead, that's all resolved in a single request cool, so i think we're out of time. Um, there's a bit more in the slides that we didn't have time to cover that dives into how these graphql database integrations work, how they use the resolve info object to generate these database queries and then there's also a little bit in the slides that talk about Some of the like low code - tooling, that's built on top of this core integration, um and then some resources for for sort of learning more about some of the things we talked about. But that's the the basic idea i think of neo4j and using the neo4j graphql integrations to build graphql apis. Do we have time for any uh any questions, or should we just leave it there? Perhaps no? No! We, of course we have time for questions uh. Thank you so much will that was amazing and you can see people are giving virtual clubs on the chat. So one of the downsides of a virtual conference is that you cannot hear the the clapping unless people are actually unmuting and most of my students prefer to write their club in the chat. And yes, of course, we have time questions great. Okay, i have a question. Um [ Music ] for the whole work not for this homework, but for i i'm planning on giving special christmas homework, which is all optional, but it's all about deployment given that students want to use neo4j graphqs is there like there's an easy way to deploy neo4j? Maybe there's a managed hosted database which they can use for development testing purposes yeah. So, for the purposes of like homework development testing, probably the easiest thing to do is use neo4j, sandbox um, let's log out here, uh sandbox, is what i want. Um i'll link in the chat here so nerf day. Sandbox allows you to spin up a neo4j instance. Um go through the the flow here, so you can see what it looks like, but basically spin up an efj instance that is posted uh for you. You do need to to sign in the reason that you sign in um is because these instances are private to you uh, so you can spin up here. We can spin up a blank one. You can also spin up one that has existing data sets loaded and they have those sort of guides that walk you through, where how to write queries specific to that data model and um and so on. But if you're, if you're sort of building an application doing some testing, you can create a blank sandbox that has no data in it and then once this spins up, i can see the connection credentials. So when i'm using the neofj drivers to create a connection and query it with application code, here we go. This is what i'm going to plug in this bolt url, that's the connection string for the database, and then i have username and password here. So this is a really good way. I think you know if you're, just sort of developing and testing if you're putting something into production. There are other options like neo4j aura is the uh neo4j database as a service there's also options on aws and google, but sandbox sandbox is free. You can create multiple of these too, so i can also spin up like a i don't know the movies data set or whatever these sandbox instances are. Temporary, though, is the only downside, so these by default live for three days, but then you get a pop-up and you can extend them for another week, so they live for 10 days. If one dies, though just spin another one up as you're as you're working on your homework and then a question, the chat about connecting from the application yeah and that's um. So there's there's the connection credentials here. But then, if you look in the drivers, tab there's also code snippets for using the different language drivers as well. There'S one for graphql too. That will show you how to spin up a graphql api connected to your database and also you can run that on code. Sandbox um as well - maybe i should add to that that i was thinking about the deployment so um nida was asking this question for local development. It'S probably best to use, i don't know docker or install it locally or whatever, because you can also run tests and it's isolated from from your co-workers once you deploy the application to the web and want to show it to your friends and families and so on. Maybe it can be really handy to have such a managed database. There is many options and possible. You can also install neo4j locally without any problems or you shouldn't have any problems right, exactly yeah, so the sort of default local. If you go to numerous.com download, the default is for that nifty desktop application, which, which has some some benefits uh, one, that i can manage a bunch of different projects and databases both locally and and connect to remote databases. But one thing we didn't talk about that. We have in desktop are these things called graph apps um graph apps are like single page applications that connect to a database in desktop. So i'm launching this one called graphql architect it's going to restart, because i it had a dependency that it has to install. But these graph apps uh are really neat because they there are things like graph visualization. So if i look here's the graph tab, if i look at neo4j bloom, for example, i can visualize my graph and bloom bloom is different from neo4j browser in the sense that it's optimized for exploring your data without writing. Cipher queries. So it has this sort of natural language way, i guess of sort of representing uh patterns in the graph, and so i can work with my data uh this way as well. Anyway, there there's also a tool called graphql architect, it's a graph app. That makes it easy to develop and query my graphql apis locally in desktop without writing any code. So i can generate these type definitions. These were generated automatically from the database when i started uh this application up, and then i can sort of edit this. If i want to add my like custom fields here with, you know like cipher directives and and whatnot uh, and then i can query them using graphical and i can export this to a github project, um and so on so anyway. So that's one one benefit of using neo4j desktop is that it has all of these cool graph apps that i can install that give me different different functionality, all right nice, any more questions. It doesn't look like it, so we have a small overtime, but i think this is okay. For for now, i'm going to take the presenter for some minutes um just to show the homework which is uh going to start today. So, thank you again will and you can stay with us if you want um. Actually, it would be great if we can also yeah i'd like to answer some questions about uh, which came up um, maybe you're, even interested in, like what our homework looks like so we're very interested in what the homework is. I'M curious if i could, if i could pass it so we have this homework repository, it's the fifth exercise for today. Uh you can see the the deadline is way in the future, so you have more than two weeks, that's because of the christmas holidays. If you want, you can ask our mentors, that's yuri and i for a feedback review, so we can give you intermediate feedback on your pull request and this time it's about basically adding persistency to our graphql and so far in exercises one two, three four: you were Working with apollo server and basically just hold the hold the data in memory. If your server stops, then the data is gone now those mutations will write data into the database and they it stays persistent and the whole goal of this. Is you well, first of all that the data persisted, but also your software tests should stay autonomous and should not have side effects? That is, if you write data in the database, you should delete the data or even consider to mock the actual database call. Second, you should make sure that the database is not left in some invalid state. So if you, for example, have i don't know if you need to set up data in multiple mutations, just make sure that it's not possible to leave? For example, a post write a post and not have the author associated with it, and these things and my question to you well is yuri made me notice that in the lecture i was talking about software, the tests that have side effects as non-atomic software tests. Apparently, there is a conflicting definition of non-atomic software tests, so i was looking at, i think at this blog post, where they were like talking about order independent uh, no causing no side effects, but there's also another one by soul, slaps and they call this um autonomous Tests and they use atomic for something else, so autonomous is what atomic is according to the definition of the other blog post, and here they say atomic is you cannot subdivide those tests anymore, like this just testing one thing and i would be interested well, do you Know our do you know the problem or do you know what i'm referring to and do you know if there's an um, unambiguous term for that or which one do you prefer yeah i mean i think um and to think of how i've referred to these things. I mean there are. There are two distinct problems here that you're pointing out right like when you, when you have a test suite you and especially for unit tests. You want um, you want them to run very, very fast right because oftentimes, you want to be able to run your unit tests as you make code changes, and you don't want to wait like five minutes or something like that for your tests. Uh, especially the unit tests to finish so oftentimes, you want these these tests to run in parallel right, um, and so that's where the i guess that one blog post is referring to autonomous. So you don't want the order to be dependent right, so there should be no side effects in your test that then another test depends upon so like. If you have a test that makes a change in maybe like a database. You don't want, then, a test following that one to be dependent on that change. You want to be able to run all of these tests in parallel and, in my mind, like the main benefit of being able to run your tests in parallel and not having them run. Serially one after the other is speed because, as i'm writing code, i want to be able to run my unit tests several times as i'm making changes. I don't want to wait and sort of like get sucked into looking at twitter as i'm waiting for my uh. My test suite to run or something like that. So i think that that's definitely one aspect and then the atomic aspect, which is that i want to be only testing one thing in a test. I don't want to have a test fail and let's say it's in our graphql example: maybe it's testing like um. I want to test ordering so that i'm ordering results by by some key. I don't want to test ordering and filtering in the same test necessarily because if that test fails, it's going to take me more time to figure out well. Is it ordering that's broken? Is it filtering? It'S broken um that sort of thing so yeah i mean i think those are. Those are two important aspects and i guess maybe maybe distinct things i don't know in my mind, i'm not usually so concerned about terminology, i'm more just uh concerned about the concepts. I guess but yeah, i think both of those are valid, yeah you're, bringing something up which is really important because um, i'm mentioning in the in the exercises that you have two options. That is, if you run the test and those tests will write data into the database. You should make sure you clean the database or you can consider not writing any data by mocking the response of the database um. This obviously will also increase the speed of your test and you get parallelization, because you don't have to run those tests in succession and wait for the first test to be finished and then run the second test you can like run all of them in parallel because They don't write data into the database. This is another advantage. Yeah awesome yeah. I was like um. I wanted to bring that up, because i was uh talking about in the lecture about it and wasn't so wonder 100 sure, if i'm actually using the right words um great. So basically, everything should stay feature wise, like in exercise. Four. We have an exercise for i, like um, defined a particular schema that needs to be implemented and we just had like a post and a user director. So we are like implementing something like hacker news and users can write post and they can upload things right. So you have this right and upward mutation, and this should stay the same, and my idea was like for the persistency thing um that i mean there are so many ways to add persistency to graphql. It can, as you saw in will's lecture. The traditional approach would be to have some data source um there's even this pattern in apollo server data sources, which you can put into your context and then call in your resolvers leading sometimes to n, plus one problem, of course, and there's so many ways right. So yeah i mean you had you have even data sources for particular databases, and we could focus on. We could have focused on that or how to write um, how to wrap and rest api with your graphql right. So this is like a very common problem that you will face if you have a legacy rest api right, but so you and i we decided. Okay, let's focus on particular advanced feature, which is not something that you could come across immediately when you start writing with graphql, but which is really interesting and versatile. That is graphql schema, stitching, schema delegation and we decided to give you two options. So um schema delegation traditionally works that you have somewhere in the internet, um an existing graphql api and you augment this great graphql api with your own resolvers. So you can feel free to use a remote graphql api by some headless cms, for example, um. I have experience with graph cms, which is exposing a graphql endpoint. You just basically put your own code in front of it um that is without neo4j right. So this is my experience from last semester was that some some students feel uncomfortable or feel overwhelmed with the complexity of neo4j. So this is also something for you. If you um yeah, if you pure, like that um, i guess it's a little bit easier for the homework, but what we're covering in the lecture is neo4j. Well, that was one click to many um and um yeah and there's also, as you saw many ways how to how to do that, and i would suggest you should write cipher just for fun, because you want to know to learn this query language from graph day. Databases, so you can use neo4j driver yourself play around with cypher a little bit and see what are my options, how the cypher will differ from scale and so on. I also mentioned naodi, which is like the or mapper equivalent for graph databases. So og mapper, you don't have to use it you're free to use it and also someone created this pull request at neither grid graphql.js how to use schema stitching and you can find code how to set that up for uh neo4j right. So if you want feel free to check this code out and and play with it here, you can see. I mean this is um yeah. I think the most important part is probably where we call neo4j graphql gs. So this is how the resolvers would look like you, so you can use delegate to schema and delegate your calls to neo4j graphql.js there's also. This is yeah, there's also already existing tests for for it, and this is probably the one of the most interesting parts where we use neo4j, graphql.js and and put our own code in front of it. So you can check that out as a demo code. It'S supposed to give you a nice and convenient setup, so you don't have to worry about wiring everything up, there's also demo code for the other architecture. Let'S say - and this is just a branch and the same repository. So this is the demo branch, and this is explanation, how you would use a remote graphql api and use the same technique. You can also use schema delegation for that, and but here you have to like do a little bit more manual work like you, have to sign up somewhere and create the schema. You would get a ui and click your schema and and yeah. So i added some screenshots where you can how the schema looks like, and then you get this api endpoint plus token, which you will need to connect your own apollo server with this. So this is about the homework yeah, some optional tasks. If you want and yeah yuri um do you think i should bring up anything else? Maybe git yeah um, i think yeah you described git crypt um. Maybe you can say one or two words regarding data, but i think in general, students need time to to have a look at the homework themselves sooner or later yeah. So i was uh. You saw what wheel was thinking about. It'S not the best idea to put credential into the repository and it's funny because i made that an objective. So in case you have api keys right then put them into an encrypted file and you can use git crypt for it. It works that you basically rebase on the last homework main branch and in this branch we have um. Let me see i'm going to click on that and go to the previous commit. No, i think i need to click on commits right yeah. This is the list of commits uh, so this is the commit where i added um an encrypted file, which you can see here. It'S like a binary file, but if you put your git crypt key, which i added to our moodle course in a particular location, then you can encrypt those. You can decrypt those encrypted files right, so um download the moodle, the the git crib key from moodle put it according to the homework instruction at the location and then run a particular command. Then you should be able to read this file and if you want to add your own encrypted files, because you have a file with like a dot and file - and you want to encrypt that - you would just add it to this here. You just specify the location. So this file is at the root level, it's called dot hello encrypted world and you have to add those um, those parameters here and that's it right. If you do that first add, then your files, then they get encrypted to you. It looks like unencrypted, so you can on your local machine. You can see it, but when you commit it and when you push it, those encrypted files should show up as binary files all right. It makes sense. You have to do that. Only if you have api keys, if you manage like, if you end up with an implementation which doesn't need api keys, fine, that's uh, that's probably the best. Then then you don't have to worry about it right is it does it? Is it understandable so far, like any questions regarding homework? Silence, uh people are writing into the chat, so i'm giving a little bit more time. I have to read through the instructions. First, oh yeah sure um after the lecture we usually have um live pair programming with every group um, so you and i the mentors um - are going vice versa and help everyone and uh like carry out the homework just for you will because uh you might be Interested the thing happens um. Well so i mean. If there are no questions i would say. Okay, someone is writing. Do you have any questions? Well, no. That looks like a good assignment. I guess um. I guess i'm. I was thinking of different ways. I could go about uh implementing that i'm glad you mentioned the the get crypt um example. I i wasn't uh familiar with that one. I'Ve seen other sort of ways of of managing secrets, but that one looks um looks pretty straightforward, looks quite nice well well, um! If that's it, i would say: let's call it a day and thank you again so much will for this amazing talk um. I hope more people feel more comfortable now using graph databases in uh, a real production, ready application and yeah. It was amazing i really enjoyed it. Thank you so much cool thanks for having me thanks, everybody for for coming great, so uh we see each other in the in the seminar yeah people are saying. Thank you right cool and i'm going to stop the recording and i can see at some point my computer crashed or obs crashed and it's it wasn't recorded fully. I think i got it yeah, that's great. It happened to me last time as well. Like i didn't touch, my keyboard didn't do anything and obs i can see in the icon. It'S not recording. So when i yeah thank you very much, everyone uh see you in the exercises. I guess yes, goodbye again will
Subscribe To Will's Newsletter
Want to know when the next blog post or video is published? Subscribe now!