Routing With Graph Algorithms
Building A Travel Guide With Gatsby, Neo4j, & GraphQL: Part 4
Using pathfinding algorithms from Neo4j's Graph Data Science library to find efficient routes between points of interest in OpenStreetMap data. We also use the NeoMap graph app for Neo4j Desktop to visualize spatial data, including our routes.
Links And Resources#
- Code on Github
- Install Neomap for Neo4j Desktop
- Neo4j Graph Data Science documentation - path finding algorithms
How's that, there we go, that looks better. 02:40 Sorry about that, 02:41 looks like it had some audio connection there, 02:44 but hopefully you can hear me now. 02:48 So welcome to the Neo4j stream, 02:50 this is the GraphQL and GRANDstack edition. 02:55 We have been working on building travel a guide 03:00 using Neo4j, Gatsby and GraphQL. 03:05 And we're going to continue working on that project today, 03:11 picking up from where we left off last time. 03:14 But before we get into that, 03:17 I want to mention a couple of things 03:21 that may be of interest. 03:22 So the first thing I want to talk about 03:25 is the upcoming NODES conference, 03:31 that is the Neo4j Online Developer Expo and Summit. 03:36 This is an online conference, 03:38 all about Graphs and Neo4j. 03:41 The first one ever was last year in October. 03:46 So this is the a second iteration 03:51 of this conference for 2020. 03:53 It's a free conference, it is October 20th. 03:58 The agenda is out, so you can check that out. 04:03 I'll drop a link to this in the chat. 04:05 So be sure to register if you want to tune in 04:11 for all kinds of exciting announcements 04:15 from the Neo4j community developer, 04:18 deep dives on lots of interesting topics, 04:23 a few including GraphQL and GRANDstack, 04:25 but also things like graph data science, graph algorithms, 04:30 deep dive into how things work in Neo4j tooling, 04:35 how to use Neo4j with different frameworks, 04:38 all kinds of stuff there. 04:39 So that's the first thing I wanted to mention. 04:43 And the second thing I want to mention is, 04:51 let me find the short link to this. 04:53 Here we go, here's the short link for this. 04:56 So I think as a few of you know, 04:58 I've been working on a book for a wile awhile now 05:04 called Fullstack GraphQL, this is published by Manning. 05:08 It's an early release from Manning. 05:13 There's I think five or six chapters out now. 05:16 So almost done there. 05:18 But there is a excerpt of three chapters 05:23 that's available for free now, 05:26 if you go to this site, and you can download 05:31 first three chapters that talk a lot about 05:35 how to use GraphQL with Neo4j is kind of the focus on 05:41 this excerpt is kind of how do we build the backend 05:43 in the context of Fullstack GraphQL. 05:48 So if that's of interest, you can download that for free. 05:52 I want to do a sort of a book club on the live stream. 05:58 So I think, I think we will start that on the stream. 06:03 Maybe once we finish up with the Gatsby project, 06:08 we'll switch over to sort of a book club 06:11 where we work through the exercises in this excerpt, 06:16 so the first three chapters that are in there. 06:20 So that's an interesting, 06:23 download the book and, you know, stay tuned on the stream 06:26 for when we can go through that as a book club. 06:31 Okay, cool. 06:32 So let's get into it then. 06:36 So again, as always, if you have any questions 06:41 as we go through this, feel free to drop it in the chat. 06:45 It doesn't necessarily have to be what we're talking about. 06:48 You can ask about anything, and we'll dig into it. 06:54 So picking up from where we left off last time. 06:58 So I have the read me open here, 07:02 let's take a look on GitHub. 07:04 GitHub, Johnny Montana, I think I called this Central Perk. 07:09 Yeah, there it is. 07:10 So here's the link. 07:14 So this is what we've been building. 07:16 So in the first session, 07:22 we talked a little bit about the data set that we're using, 07:27 which is this open street map, 07:30 import of data of Central Park in New York city 07:34 that's available as a Neo4j Sandbox. 07:38 And if you don't have that, 07:39 here's the link to download that. 07:46 Sandbox, there we go. 07:49 This one is, it's still kind of a hidden Sandbox. 07:53 So you'll want to use this link that has the 07:56 use case equals open street map, 07:59 query param in there so that it will unlock the, 08:04 this open street map Sandbox, kind of a direct link to that. 08:06 There's lots of other data sets, 08:08 but this is the one that we're using with Sandbox. 08:11 So anyway, so the first session we looked at that. 08:13 We looked at how to create a GraphQL API 08:16 using this data in Neo4j Sandbox 08:20 using Neo4j graphic LJS. 08:24 So that is here. 08:31 So this is a node JS GraphQL server that we built 08:36 just by defining our type definitions, 08:41 and using the power of Neo4j graphic LJS to spin that up. 08:45 Then at the same time, we created a Gatsby project 08:50 using the Gatsby blog starter. 08:52 Then the next session we looked at, 08:54 how do we pull data into Gatsby from the Neo4j GraphQL API. 09:01 That was part two to populate our pages. 09:05 So basically what we have our points of interest 09:06 from open street map, and in our Gatsby site, 09:09 we want to build sort of a detail page 09:12 for each point of interest that has pictures, 09:14 information, that kind of thing. 09:17 So in the last session last week, 09:19 we looked at how do we create custom resolvers 09:24 in our Neo4j GraphQL API to one call out to Mapillary. 09:31 So the Mapillary API is kind of like 09:34 a crowdsourced Google street view. 09:36 It allows us to find images that are showing certain areas. 09:44 So we pull in those Mapillary photos, 09:46 and then we saw a different way to add custom logic 09:49 to our GraphQL API in this case, searching Wikipedia, 09:55 using apoc. load. json within Cypher. 09:59 So that was pretty neat because that allowed us to add 10:01 Wikipedia data to our Gatsby project. 10:07 So let's start this up. 10:08 So I'm going to go to the Neo4j GraphQL directory, 10:15 and start our GraphQL API. 10:19 So that's going to start this project, 10:23 on local host 3003. 10:30 So this allows us to do things like querying for 10:37 points of interest. 10:38 Let's grab the first, I don't know, the first 10 here. 10:45 Make sure that works. 10:46 Yep, so that's pulling data down from my Sandbox instance, 10:50 which I have running here. 10:52 So I'm connecting this GraphQL API that I'm running locally 10:55 to this Neo4j Sandbox instance 10:58 running in the cloud somewhere with my open street map data. 11:04 And now, let's open a new terminal tab 11:08 and CD into Gatsby site, and we'll do a Gatsby develop. 11:17 And what this is going to do, 11:18 so this is going to pull in this Neo4j GraphQL API, 11:23 pull that into the Gatsby GraphQL API. 11:31 It looks like we had an error there in our GraphQL query. 11:37 We'll have to figure that out in a second. 11:47 Okay, let's see what mistake we made here. 11:55 Maybe let's do a Gatsby clean to clean the cache. 12:09 I don't think I've changed anything 12:11 since we left off last time, but, 12:14 yeah, it looks like we do so it doesn't like photos. 12:19 Could we not have photos? 12:21 We added photos, first one. 12:36 (typing on keyboard) 12:46 Ah. 12:50 Okay, so a few things going on there. 12:51 So one thing, if you notice, when I didn't include 12:59 latitude and longitude 13:04 in my selection set, I get an error that says, 13:08 cannot read property longitude of undefined. 13:12 But when I add location latitude and longitude, 13:15 now I'm getting a different error 13:17 this is, I think, a Mapillary API key 13:19 that will resolve in a second. 13:21 Let me stop the Gatsby build. 13:25 The reason for that, if we look at the photos resolver here, 13:33 is, it is assuming that we have a location dot longitude, 13:41 and location dot latitude fields that have been resolved. 13:47 So those are kind of a dependency to be able to 13:50 fetch photos from the Mapillary API. 13:53 So what we really should do, there's a way that you can 14:01 declare fragments to be a dependency. 14:03 So it's sort of a GraphQL transform 14:05 to add fragments to a request. 14:11 We should take a look at that. 14:12 What I did, that we didn't run into this before 14:17 is because in the query that we're using in Gatsby, 14:20 so let's jump over there. 14:22 The query that we're using in Gatsby, 14:25 if we look at the source template, blog posts, 14:30 and look at this query. 14:33 So in this query, 14:34 we're pulling in latitude and longitude already. 14:39 So there's only one query and Gatsby 14:42 where we're pulling in photos, so that's being included 14:47 in the blog template component. 14:50 So we should look at how to fix that properly, 14:53 but as long as we include latitude and longitude, 14:54 we don't get that error in the Photos resolver, 14:59 but the error we have now, 15:02 when we call out to the Mapillary API, 15:07 so back in our Neo4j GraphQL project. 15:14 When we call it to the Mapillary API, 15:17 we need a Mapillary key from reading, 15:23 from an environment variable. 15:24 It looks like we maybe forgot to define that. 15:26 So I have a dot EMB file, 15:28 oh yeah, so somehow that got removed. 15:31 So this is what I think Mapillary key equals, 15:37 let me sign in to Mapillary here. 15:44 I'll do that down here, 15:46 so don't have to look at my keys 15:50 for all of my Mapillary projects. 15:55 Not really that big a deal. 16:02 Let's see, we're developers. 16:06 They want the client ID, copy that. 16:11 Well, you get to see it anyway, that's fine. 16:15 Okay and then I think we need to restart 16:20 our Neo4j GraphQL API. 16:26 And now, if we add photos, 16:31 should call it to the Mapillary API, find some photos. 16:37 There we go, now we've got some photos. 16:39 Okay, cool. 16:40 So, back to Gatsby. 16:47 Let's do a Gatsby develop. 16:54 Now take a sec to build. 16:57 If you remember, 16:58 when we added the call to the Mapillary API, 17:00 that slowed down our builds. 17:02 'cause we have to make a request for each point of interest 17:05 as we're building all of these pages. 17:10 So while we're waiting for that to build, 17:12 a question from the chat, does Neo4j have a Docker image? 17:17 Yes, Neo4j does have a Docker image. 17:21 If we look at YouTube.com developer, 17:32 and the developer guides. 17:38 And if we search for Docker, 17:45 there is a developer guide on using Neo4j with Docker. 17:51 So Neo4j.com/developer/docker. 17:53 So there is an official image. 17:57 There's some guides here that talk about how to use 18:03 Docker with different kinds of configurations. 18:06 You can even install plugins, and so on. 18:10 So lots of options there. 18:13 There's also, 18:14 so if we take a look at the Neo4j labs page, 18:21 so Neo4j labs is a team that builds 18:26 sort of more experimental integrations 18:30 and extensions for Neo4j. 18:32 It's actually the team that I work on at Neo4j. 18:35 So these labs projects, they're not, 18:39 not yet, I should say, 18:40 officially supported pieces of the product. 18:43 They're more experimental, with the idea that 18:46 once they're sort of validated and then incubated, 18:48 then they move on to become official parts of the product. 18:52 The thing I want to mention there, 18:53 because you asked about Docker is Neo4j-helm. 18:56 So if you're interested in using Neo4j in Kubernetes, 19:03 so basically orchestrating Docker containers, 19:08 there is a helm chart for using Neo4j, 19:11 which allows you to deploy a cluster, 19:16 a Neo4j cluster in Kubernetes. 19:18 So anyway, since you asked about Docker, 19:20 that might be of interest. 19:27 Okay. 19:28 It looks like our Gatsby build finished with a few warnings, 19:33 'cause I left a few things out. 19:38 So we can clean those up later, that's fine. 19:40 So let's go to local host. 19:44 8,000. 19:46 And so if you remember what we did is we built 19:50 this index page that just lists every point of interest 19:53 from the database. 19:54 And then we have a detail page where in the URL, 19:58 this slug is the open street map note ID for, 20:05 for that point of interest. 20:07 So anyway, so here's our picture 20:08 of the point of interest in this case, 20:11 this reservoir from Mapillary. 20:13 The details we have from open street map, 20:17 and then pulling in data from Wikipedia 20:21 for our travel guide. 20:23 Okay, cool. 20:24 So that's where we left off last time. 20:26 And what I want to do next is 20:30 start to add some maps to our project. 20:34 So I think we want to have a big map on this index page 20:39 that basically has, you know, 20:40 here's a map of Central Park, and markers for all of these 20:44 points of interests so we can see where they are 20:46 and sort of click on a marker, 20:48 and it links to one of these detailed pages, right? 20:54 So that's the goal for the index page. 20:57 And then for these detail pages, 21:02 what I'd like to do is have, 21:05 have a map that shows 21:09 where this specific point of interest is, 21:11 but then allows you to 21:14 find a route to your next point of interest. 21:17 So, show you the path for the next point of interest 21:23 that you want to go to, depending on what you select. 21:27 So that's what I want to start working on 21:29 is this map functionality. 21:33 But first we need to take a look at routing. 21:38 So how can we go from this three dancing maidens statue, 21:45 maybe I want to go to next, what would be a fun one, 21:51 maybe Crime Busters Rock. 21:54 Sure. 21:56 So I want to be able to see a map, 21:58 and show me a path on that map of the best way to get there. 22:02 If you've ever been to Central Park, 22:04 you'll know, there's a lot of little 22:08 bike paths and sidewalks and 22:11 paths through the woods and streets. 22:12 So it's pretty easy to get lost there. 22:16 So we definitely want to have 22:18 some routing to tell us where to go. 22:23 So that's, I think what we'll, what we'll focus on today is, 22:28 let's update our GraphQL API to have better routing 22:34 between two points, any given two points of interest. 22:40 And we did this a little bit, if you remember. 22:45 Let's say we want to go, let's return the node, 22:50 OSM ID in this GraphQL query as well. 22:55 We can take the photos out, we don't get the photos. 22:57 Make this a bit bigger. 22:59 So let's say we want to go from 23:02 the Jacqueline Kennedy Onassis Reservoir 23:06 to this Robert Burns statue. 23:10 So I'm just going to copy that node, open street map ID. 23:14 And if you remember, we added this route to POI. 23:30 This route to POI field that then, 23:37 so it takes a POI. 23:43 And then returns a list of these 23:52 latitude longitude objects. 23:55 It is complaining about, 23:57 oh, expected an integer, right, 23:59 but we actually want this not to be an integer. 24:07 We want this to be of type non-billable ID. 24:13 If you look at that, 24:14 it's the type we're using for the node OSM ID. 24:18 Not sure why we left that as an ENT. 24:22 Anyway. 24:23 Okay, now if we run this, 24:27 Oh, I must have, alright, just the first one. 24:30 So we're getting an error there, 24:31 'cause I was trying to calculate routes 24:35 for the first 10 points of interest, 24:37 which included the Jacqueline Kennedy Onassis Reservoir. 24:41 So we got an error saying I can't route to myself. 24:44 But if we search for just, 24:46 so find the first point of interest, 24:48 which is the Jacqueline Kennedy Onassis Reservoir, 24:51 and then route to this other point of interest by this ID, 24:55 which is the, the Robert Burns statue. 24:58 It gives me this latitude and longitude, 25:02 a series of steps to take. 25:05 So what we want to do in the application is 25:10 allow someone to select some other point of interest, 25:13 and then on the map, we want to use 25:16 these latitude and longitude pairs 25:19 to draw a path so they know where to go. 25:22 But first, let's look at 25:24 how this route to POI is being calculated. 25:27 So that is this field, let's make this a bit bigger. 25:35 There we go. 25:36 So that is this field, 25:38 route to POI on the point of interest type. 25:43 And we can see that it takes as an argument, POI, 25:48 which is the ID of the other point of interest 25:51 I want to route to. 25:53 And then it's going to return a list of step objects, 26:00 where step objects is just, for now anyway, 26:04 latitude and longitude pairs. 26:11 Okay, so then we add a Cypher schema directive, 26:18 and a Cypher schema directive allows us to 26:23 attach a Cypher statement to our GraphQL fields. 26:30 This is saying, anytime you request route to POI, 26:34 I'm going to execute this cipher statement 26:38 to resolve the data for this field. 26:42 And if we look at what we're doing, 26:44 we are matching on a point of interest node. 26:49 So finding the point of interest we want to route to. 26:52 So any time we use a Cypher directive, 26:56 and have an argument on that GraphQL field, 27:00 any of those arguments are passed into 27:04 the annotated cipher statement as parameters, 27:08 as cipher parameters. 27:10 So here, the argument is POI gets passed in 27:13 as a cipher parameter dollar sign POI is the syntax for 27:18 using cipher parameters. 27:25 Okay, and then once we match on that, 27:28 well then we use the shortest path algorithm 27:32 to find a path following these route relationships 27:37 to the other point of interest, up to 200 hops deep. 27:43 And once we find that, 27:44 then we just grab all of the nodes from that path, 27:48 and return this latitude and longitude object, 27:52 which matches this step object. 27:56 So step is not an actual type in the database. 28:00 We don't have any step nodes. 28:02 Instead, that's something that we're just projecting out 28:05 from this cipher query. 28:07 So we can do that with these cipher directive fields, 28:10 as long as we don't go too crazy 28:12 in the kind of nested selection that we want to have. 28:16 We can return objects 28:19 that map to types in our GraphQL schema. 28:25 Okay, so it would seem like, you know, 28:27 hey, we've got this shortest path function in Cypher. 28:33 We've got a route, 28:34 it's giving us the latitude and longitude. 28:36 That should be good enough, right? 28:38 Like why, 28:40 why don't we just put a map on this and call it a day? 28:43 And what I want to show is that 28:46 this shortest path function in Cypher 28:50 is giving us the shortest unweighted path. 28:56 So in the graph, it's looking at 28:59 the path from the Jacqueline Kennedy Onassis Reservoir 29:05 to the Robert Burns statue, 29:07 following the route relationships, 29:10 it's finding the shortest path 29:12 measured by the number of relationships to follow. 29:15 But really what we're interested in is 29:18 the shortest distance between those two nodes, 29:23 and each one of these route relationships 29:28 has a distance property. 29:30 So we know each step, basically the cost, 29:33 the weight of that relationship. 29:34 So to go from say, this point to this point, 29:38 I don't know, maybe that's 50 meters. 29:41 To go from this one to this one, that might be five meters, 29:44 and so on. 29:45 And the true cost is the sum of that path. 29:48 So unfortunately, the shortest path function 29:52 here that we're using is not giving us 29:55 the shortest weighted path. 29:57 It's just looking at the number of 29:59 relationships in the route. 30:03 So what we want to do is figure out 30:07 how we get the shortest weighted path. 30:11 But to do that, and to sort of prove to ourselves 30:16 that we're not getting the shortest path by distance, 30:21 what I want to do is take a look at 30:25 some tooling for Neo4j desktop that allows us 30:29 to visualize this spatial data, 30:33 including these routes that we're looking at. 30:37 So I'm gonna switch over to Neo4j desktop. 30:50 From the chat question to zoom in, 30:54 yeah, hopefully that was, 30:59 that was big enough here. 31:00 I think if I go any bigger, 31:02 I can't really fit much on the screen. 31:03 So hopefully that was, that was good enough zoom level. 31:06 And another question, 31:08 what about previous videos of the series? 31:10 Yeah, so if you look at the GitHub page for this project, 31:18 all of the live streams are recorded on YouTube, 31:24 and they're linked here. 31:25 And I won't click on one of these 'cause it'll start playing 31:29 and we'll be watching a stream from within a stream. 31:32 But you can find the recordings there. 31:35 And there's a YouTube playlist that has 31:39 all of the recordings for this project, 31:43 for the previous one we did, which was the real estate app, 31:46 and then there's another playlist for all of the 31:49 GraphQL and GRANDstack videos that we've been doing. 31:52 You can also, if you look through the commit history, 31:56 there should be, yeah, 31:57 so there's like one commit for each, 32:03 each one of the streams that we do. 32:04 So you can follow along that way too. 32:11 Cool. 32:14 Okay, so what I want to do is 32:15 figure out how I can visualize some of this spatial data. 32:22 So I'm gonna switch over to Neo4j desktop. 32:25 So Neo4j desktop has these things called graph apps. 32:32 So these graph apps are 32:35 applications that you can add into Neo4j desktop 32:39 that kind of enhanced the functionality 32:41 of what you can do within desktop. 32:44 If you haven't used desktop before, 32:46 you can think of desktop as kind of like mission control 32:50 for Neo4j. 32:52 So it allows you to have multiple projects, 32:55 and then within each project, 32:58 you can have multiple databases. 33:02 So I'm going to create a new project. 33:08 Let's call this Central Perk. 33:12 And I can create multiple databases 33:17 for each one of these projects. 33:18 I can download a database locally, 33:23 I can choose a bunch of different versions. 33:26 But it also allows me to connect to remote databases. 33:30 So if I'm hosting a database in the Cloud somewhere, 33:34 if I'm using the Neo4j Aura Cloud service, 33:38 I can connect these databases in desktop 33:44 to remote Neo4j instances. 33:46 And then I can use all the functionality 33:48 within Neo4j desktop, including these graph apps 33:52 with those remote databases. 33:55 And so that's exactly what I want to do, 33:58 is connect this database to my remote Sandbox database. 34:03 So let's call this my LSM Sandbox. 34:09 And if I jump to Sandbox, 34:12 I can get my connection credentials. 34:15 There's the bolt URL. 34:18 And we'll do username, password authentication. 34:22 So user is always Neo4j on these Sandbox instances, 34:26 and then here's my random password. 34:33 So this will now connect Neo4j desktop 34:40 to my remote Sandbox instance, 34:42 and I can open up Neo4j browser to prove this. 34:50 So let's do a, make this bigger, 34:54 let's do a call, 35:00 do that schema visualization. 35:03 And yep, here's my data model. 35:05 Got all kinds of open street map data in here, 35:08 including my routing graph, 35:10 which is what I'm most interested right now. 35:11 Cool, so that looks good. 35:14 You'll see this, 35:15 the summary of node browser guide 35:17 when you first connect to this. 35:19 That's because you originally used this data set 35:22 as one of the summary of nodes challenges. 35:25 So you've seen that before, that's where that comes from. 35:29 Okay, cool. 35:30 So that was Neo4j browser. 35:36 But if I click on this down arrow, 35:39 I can see all of these other graph apps that I can use 35:43 to work with my Sandbox instance. 35:47 You may not see as many of these, 35:49 you may need to install some of these 35:53 if you're not seeing them, 35:54 but you can open the graph apps gallery 35:59 to see what is available. 36:01 So the graph apps gallery, showing all of these graph apps 36:05 that you can install. 36:06 Most of these, you can install it just 36:09 one click here into Neo4j desktop. 36:12 So there's a ton of things here from the ETL tool, 36:16 for importing data, from relational databases, 36:20 Halin for, for monitoring your Neo4j cluster, 36:23 the Graph Data Science Playground, 36:25 which is kind of a low code environment 36:27 for working with graph data science. 36:30 Graphical Architect, 36:32 which we've taken a look at before in this stream. 36:35 This is for building and querying GraphQL APIs with Neo4j. 36:41 But the one we're interested in today is NeoMap, 36:45 which is this awesome graph app that allows us to 36:49 visualize spacial data that Estelle wrote and maintains. 36:56 So click install for this. 36:58 I'm not gonna click it, cause I've already installed it. 37:02 But if you click that, it will install NeoMap, 37:05 and then you will have that available 37:08 to connect to your Sandbox instance. 37:15 Cool. 37:16 So the idea with NeoMap is, 37:19 to basically allow us to create layers, 37:22 to visualize spacial data from Neo4j. 37:28 So I'm going to create a new layer. 37:31 And the first thing I want to do is 37:33 visualize all of the points of interest. 37:36 So let's call this layer POIs, 37:42 and we want to choose the type of the layer. 37:50 So if we're going to give individual 37:52 latitude and longitude values stored as 37:55 individual, like floating points properties, 37:59 or if we're using the point database type, which we are, 38:05 so we'll select that. 38:06 We can also define layers with Cypher queries, 38:09 which we'll do in a second when we start looking at routing. 38:12 But for now, we want just points. 38:17 So what labels, we want to look at 38:20 all of the point of interest labels, 38:24 restoring the point location data in the location property, 38:30 and then for the tool tip, we'll show the name of the, 38:37 of the point of interest. 38:39 For the maximum number of nodes, well, 38:41 I think we have like a hundred and some points of interest, 38:44 so we'll set that at 200 and they'll show us everything. 38:48 And we want markers. 38:51 Okay, so we'll click update map. 38:54 Zoom in on Central Park. 38:56 And after all those markers are added, 39:00 we can see our points of interest on the map. 39:05 Cool. 39:06 So here's Central Park. 39:08 I can zoom out a bit. 39:09 So here's New York city, 39:14 here's Manhattan, and Central Park is 39:17 this giant park in the middle of Manhattan. 39:20 So all of our points of interest are within Central Park, 39:24 since that's where we trimmed 39:26 the open street map data import that we're using. 39:30 Cool. 39:31 And I can click on some of these 39:32 and see the name of the point of interest. 39:37 There's a big cluster here. 39:39 What is this? 39:40 Oh, this is the, it's a bunch of tennis courts. 39:44 If you remember, in our Gatsby blog, 39:49 we had a whole bunch of points of interest called pitch. 39:53 So I guess each one of these courts 39:55 has its own point of interest called pitch. 39:59 Here's the Jacqueline Kennedy Onassis Reservoir. 40:02 Cool, okay. 40:03 So we have our points of interest marked on the map, 40:12 so we can sort of see where these things are 40:14 that we're talking about, so that's cool. 40:16 And what I wanted to prove to you 40:19 is that the Cypher shortest path function 40:23 is not giving us the optimal path. 40:27 That instead, it is giving us the shortest path 40:34 measured by number of relationships, 40:38 but not by distance of the path necessarily. 40:42 So let's take a look at that, 40:44 and see how we can visualize the path for that route. 40:51 So, here's the cipher query we were using 41:02 for the shortest path function. 41:05 So let's jump into Sandbox. 41:08 We'll jump into Neo4j browser, 41:15 just to get our query right. 41:21 Okay, so. 41:25 So, first let's, match on two things here. 41:31 So a point of interest, let's look up by name. 41:33 That should be a bit easier. 41:35 So we're going to say match node A, 41:39 or the name is something, 41:40 and node B, or the name of something else, 41:44 and then, find the shortest path between A and B 41:53 instead of this and other, 41:54 and then grab all the nodes in the path, 41:57 and return latitude and longitude. 42:01 Now we're gonna to use this query in NeoMap, 42:04 and it expects us to return a latitude value. 42:14 So we're going to alias this to latitudes, 42:18 and N dot location, dot longitude as longitude. 42:27 So with GraphQL, we want to return 42:30 an object with those values, 42:32 because we're mapping that to that step type 42:37 in our GraphQL schema that we looked at earlier. 42:39 So we want to match the shape of that object. 42:43 Both NeoMap, 42:44 since we're going to visualize these in Neo map, 42:46 we want a latitude and longitude value in the return. 42:51 Okay, if we run this, you're gonna get nothing back 42:53 because we need to first map, 42:56 match on these two points of interest 42:58 that we want to go from and to. 43:01 So, we were looking at going from 43:05 the Jacqueline Kennedy Onassis Reservoir 43:10 to, oh I don't have the name here, I have the ID. 43:16 I can find that. 43:18 Was it, I think the Robert Burns statue? 43:25 Let's look that up. 43:26 So I'm just switching to a different Neo4j browser instance, 43:30 just to look up the name of that point of interest. 43:36 Well, I guess we could do it in GraphQL, couldn't we? 43:38 Let's just do that. 43:42 Point of interest node OSM ID is this. 43:51 What is the name? 43:53 Robert Burns. 43:54 Not Robert Burns statue, just Robert Burns. 43:56 Okay. 43:58 And different on that, cool. 44:01 That should be the same as this query. 44:15 Cool. 44:16 So this is now the same route that we were getting 44:19 in our GraphQL API, just in the Neo4j browser. 44:23 And what I'm going to do is copy this query, 44:27 and switch to NeoMap. 44:33 And now what I want to do to visualize this path, 44:37 I'm going to add a new layer, 44:43 and we'll call this, shortest path. 44:48 And the layer type is now going to be advanced Cypher query. 44:53 This is the Cypher query that I want to use 44:55 to define the layer. 44:58 And this is going to be a polyline, 45:01 'cause I want to see the path drawn. 45:04 Let's pick a different color, 45:06 something that stands out so we can see this. 45:11 Kind of red and green and stuff, 45:14 should we just do dark red? 45:18 Yeah, how about that. 45:20 Okay, let's see if that works. 45:25 Okay, cool. 45:26 So here's our path from 45:31 the Jacqueline Kennedy Onassis Reservoir, 45:35 looks like we get on this running track. 45:39 And then you see this isn't quite perfect, 45:41 'cause we're jumping across the water here. 45:45 But you can see that it does continue along the edge here. 45:52 Okay. 45:52 And then we're sort of making our way on 45:58 East Drive, and some trails to get to this one, 46:06 which should be our Robert Burns statue. 46:11 Okay, that actually doesn't look too bad. 46:17 So that might actually end up being an okay path, 46:21 but let's see how we can compute shortest weighted path 46:26 and see if these paths end up being different. 46:31 So is the water deep? 46:33 I don't know, that's a good question. 46:38 Can you swim in it? 46:39 I don't know. 46:40 I know they have those, like rowboats that you can rent. 46:47 I dunno. 46:47 I mean, it's a reservoir. 46:48 It looks pretty, pretty big, 46:51 so I imagine the water is pretty deep. 46:52 Probably not something you want to try to walk across. 46:57 Okay, so that's how to visualize the cipher shortest path. 47:04 How do we do shortest weighted path? 47:08 Well, let's go to the docs. 47:18 So we're going to use the Graph Data Science Library, 47:25 which is a plugin for Neo4j. 47:30 You can install it in Sandbox with just one click. 47:33 You can download it and install in a Neo4j instance. 47:38 It comes by default in Sandbox as well. 47:40 So, you might as well consider it's part of Neo4j. 47:47 And there's a number of algorithms 47:49 that are available in this Graph Data Science Library. 47:55 If you look at the docs here, 48:00 let me make this a bit bigger. 48:04 Sort of the classes of algorithms are centrality, 48:09 community detection, similarity, pathfinding, 48:15 link prediction, and node embeddings. 48:19 And then there's a Pregel API 48:22 for you to build your own algorithms with. 48:25 So centrality, these are gonna be things like 48:29 page rank, and betweenness centrality 48:32 that allow us to find the sort of 48:36 like the most important nodes in a network, 48:41 find nodes with a high influence, that sort of thing. 48:48 For the community detection algorithms, 48:52 these are going to be things like Louvain, 48:59 labeled propagation, those sorts of algorithms 49:01 that are going to be finding communities 49:03 and clusters in the graph. 49:09 The similarity algorithms, most of these are, 49:13 actually, I guess what you would call 49:16 set comparison operations, where you're comparing maybe 49:21 two vectors of ratings for 49:26 movies or music or something like that. 49:29 And you want to come up with a 49:34 measure of how similar two user's interests are. 49:38 So these are very useful often for recommendations. 49:44 So if I want to recommend a book or music or restaurants, 49:49 I want to find users that have a similar taste as myself. 49:55 And one way to do that is to calculate a 49:58 similarity between two users using 50:02 their ratings on previous products. 50:06 But anyway, the algorithms that we're interested in today 50:10 are these path finding algorithms. 50:14 And specifically, we're going to start with shortest path. 50:23 So shortest path, this is, 50:27 is Dijkstra's algorithm, essentially. 50:33 Here's the, let's see, do we have a diagram of Dijkstra's? 50:39 No, let's jump over to Wikipedia. 50:43 So, Dijkstra's algorithm. 50:48 So if we take a step back, 50:51 and look at our data, let's look at, 51:05 I guess, just the first one. 51:08 So this is, which one is this, 51:10 this is our Jacqueline Kennedy Onassis Reservoir again. 51:16 And if we expand out a few hops, 51:19 so here's some sort of metadata associated with it, 51:26 but these are the nodes we're interested in. 51:27 So what we're doing to find these routes 51:31 is we're following the route relationships. 51:34 And these are connecting nodes with the label, OSM node. 51:41 So these are open street map observations 51:45 for different, they're called ways, 51:48 but you can think of them as like a path, essentially. 51:54 And so what we're doing is 51:55 following these route relationships, 51:58 some of these OSM nodes are going to be 52:01 intersections or points of interest themselves. 52:04 So you can see, you have multiple labels here. 52:07 So this Jacqueline Kennedy Onassis node, 52:11 Onassis Reservoir node is an OSM node. 52:14 It's an intersection, and it's a point of interest. 52:18 So what we want to do is follow these route relationships 52:23 from OSM node OSM node, 52:26 until we get to our other point of interest 52:30 that we're trying to reach. 52:33 And to do that, we need to use 52:35 a graph search algorithm to do that. 52:40 The shortest path algorithm and cipher 52:43 that we were using before, that implements a what's called 52:47 a binary breadth first traversal. 52:52 So it's basically starting from the starting point 52:56 and the ending point, so in this case, 52:59 both of our points of interest. 53:01 And doing a breadth first search, 53:04 going out from each one of those nodes. 53:07 So because it's coming out 53:09 from both of those points at once, 53:11 it's typically pretty, pretty fast, 53:14 and it's always going to find the shortest path 53:18 In a breadth first manner, 53:21 but not looking at the weight of the path. 53:24 So if we look at these route relationships, 53:28 we can see that there's a distance property. 53:32 So this one is 9.48. 53:34 So meters from this node to this other node is 39 meters. 53:42 So they're not evenly spaced. 53:44 So we're not necessarily going to get 53:46 the shortest path using that breadth first traversal 53:52 that the shortest path function gives us. 53:54 So for that, we want Dijkstra's algorithm. 53:59 And Dijkstra, the way this algorithm works, 54:02 you can think of as a breadth first search, 54:06 where we're sort of looking at 54:11 each of the neighbors in each iteration, 54:14 and then the next iteration of the algorithm, 54:16 looking at all of the neighbors 54:18 of all of the neighbors we've explored so far, 54:21 and continuing on that way 54:23 until we have reached the destination. 54:30 But the difference with Dijkstra 54:32 is that along the way, we calculate the sum of 54:37 the path that we have found so far, 54:41 and we prioritize the routes with the lowest cost. 54:47 I think there's, yeah, here we go. 54:49 So here's a video of sort of how this works, 54:53 where the red nodes are 54:56 all of the nodes that we've explored, 55:01 trying to get to here, 55:03 and we're prioritizing the nodes with the lowest cost. 55:14 Okay, so that's the algorithm. 55:16 How do we actually use it? 55:18 Well, here's the documentation 55:22 in the Graph Data Science Library, 55:24 we call it the shortest path algorithm, 55:27 it's really Dijkstra. 55:28 So Graph Data Science exposes these procedures 55:33 name spaced with GDS. 55:36 So to use the Dijkstra algorithm, we'll say call, 55:41 so call is the syntax for calling a procedure in Neo4j, 55:45 call GDS dot alpha. 55:47 So this alpha, this is indicating the tier of the algorithm. 55:55 So in the data science library, there are, 55:59 like I showed a number of categories of algorithms. 56:03 So there's different implementations 56:05 in these different categories of different algorithms. 56:08 And some of these algorithms, 56:10 the implementation has been optimized. 56:14 All the edge cases have been figured out, it's really fast, 56:17 we can use it on massive data sets, 56:19 and we have a lot of certainty 56:23 that the algorithm is going to be performant, 56:26 or the implementation of the algorithm 56:27 is going to be performant. 56:29 Those are in what's called the product tier. 56:32 So those are the officially supported algorithms. 56:36 Others are name spaced in alpha, which means, 56:42 you know, it works, it's an implementation of the algorithm, 56:45 but it may not have been optimized 56:48 for the best performance or scale at this point. 56:51 So sort of a use at your own risk kind of warning. 56:54 So that's where the alpha name space comes in. 56:58 And then, the name of the algorithm in this case, 57:00 shortest path, and then we can either stream, or write. 57:05 So in this case, 57:07 we don't want to write back to the database 57:09 the result of the algorithm, 57:10 we just want to stream the results out. 57:14 Here's the different options. 57:16 And let's find an example here. 57:19 So here's an example. 57:21 We're creating some road data. 57:23 So we match on the two locations that we're interested in. 57:26 We called GDS alpha shortest path dot stream, 57:32 and then we have to create a node projection, 57:36 and a relationship projection. 57:39 So what is that? 57:41 Well, what graph data science is doing behind the scenes 57:46 is constructing a projection of 57:50 the graph that's stored in the database. 57:53 Oftentimes, we want to run our algorithms, 57:56 not necessarily on everything that's stored in the database. 58:01 Maybe just some sub graph, 58:03 maybe some projection of the graph. 58:05 Maybe we want to run our algorithm 58:08 on sort of an inferred graph. 58:12 That's often the case and things like social network data. 58:16 So we can define that projection using a cipher query 58:22 like we're doing down here, 58:24 or we can define that projection using some configuration 58:29 to specify the nodes and relationships we want to follow, 58:33 which is what we're doing here. 58:35 Okay, so let's modify. 58:41 Where did it go? 58:42 Oh there, here, 58:44 let's modify this query to now find 58:49 the shortest weighted path using Dijkstra's algorithm. 58:56 So what we're going to do, 58:59 we are matching on the first two points of interest. 59:03 So A and B, so that looks good. 59:06 Let's get rid of this. 59:08 So we'll say call GDS dot alpha dot 59:16 the shortest path dot stream. 59:22 And now we pass in some configuration. 59:28 So for the node projection, 59:32 we want to look at OSM nodes. 59:35 So all of thee nodes that were following 59:38 both our point of interests, 59:40 and the nodes that represent 59:42 the spaces on the path that we're following, 59:45 those all have the label OSM node. 59:49 And then we need relationship projection. 60:00 And here we're specifying all of the relationships 60:05 that we want to bring in to this projective graph. 60:08 In this case, it's pretty easy, 60:10 we just want the route relationship. 60:17 So that becomes the key in this configuration object 60:22 we're passing to relationship projection. 60:26 And we can specify the type. 60:31 So we want the relationship route type, 60:34 and then any properties we want to bring through. 60:38 And we're just interested in distance here, 60:39 that's the only relationship property 60:41 that we're interested in. 60:44 And then the orientation, do we want incoming, outgoing? 60:52 In this case, this is undirected, 60:54 'cause I don't know, we're not concerning ourselves 60:57 with sort of one way streets here. 60:59 We're saying, assuming you can go 61:01 either way on any of these paths. 61:04 Okay, so that's the relationship projection. 61:11 And we need to specify the start node. 61:16 So that's going to be the reservoir. 61:19 So start node is A, and the end node, 61:24 that's the Robert Burns statue, that's going to be B. 61:28 And then we need to specify a, 61:32 I think relationship distance or weight. 61:36 Yeah weight, weight property. 61:39 Relationship weight property is distance. 61:43 So that's this property here that we're bringing through. 61:46 So each one of these route relationships has a distance, 61:50 that's going to be the weight 61:51 that we're going to consider in this algorithm. 61:56 Okay, so that's our procedure call. 62:00 So that's the configuration we're passing in. 62:03 And then this procedure yields a node ID, and a cost. 62:11 So for the optimal path that it finds, 62:16 it's gonna give us the idea of the node, 62:18 and then the cumulative cost that we've built up. 62:27 It should be able just to return that. 62:34 And if we look at, oh yeah, it's not return star, 62:39 it's return node ID and the cost. 62:42 So that gave us kind of a weird output, 62:44 'cause I was returning both the nodes and tabular data. 62:47 But here's just the node ID and the cumulative costs, 62:54 measured in meters, I think here. 62:57 So here's the ID of the first nodes, 62:59 this is the ID for here. 63:04 So that's the internal Neo4j ID right here, two oh five. 63:09 So yep, here's that point. 63:10 Let's see how far we can follow this path. 63:13 And then the next one is 49 69, which is this one. 63:24 And it says the cost is 15 meters, 63:30 and yep, 15 meters is the distance on the relationship. 63:39 And then from here, what's the next one, 49 62. 63:44 So that's this one, and this distance is 43. 63:48 So 43 plus 15 is 58. 63:50 So you can see what's going on there. 63:52 This is giving us the optimal path to follow. 63:55 And after 62 steps, we get to where we want to go. 64:01 2,800 meters. 64:07 Okay, let's modify this so we can visualize it in NeoMap. 64:17 There's a helper function, 64:21 GDS util what is it, as node, I think, 64:28 that we can pass the node ID into 64:30 that will return the node itself. 64:34 And then what we want to return is 64:36 location dot latitude as latitude, 64:40 so NeoMap expects each one of these 64:44 latitude and longitude values to be 64:46 returned as latitude and longitude, 64:49 so we'll just go ahead and do that. 64:56 This should just give us a list of latitude and longitude. 64:59 And now, let's copy this, go over to NeoMap, 65:06 and let's create a new layer. 65:11 And we'll call this, Dijkstra's algorithm. 65:19 It's going to be advanced. 65:21 We're going to find it with this cipher query 65:23 that we copied it from Neo4j browser. 65:25 We want a polyline, let's change the color to black, 65:34 and let's see if these paths are different at all. 65:37 Oh, and they are, quite different. 65:39 That's fun. 65:41 So, zoom in a bit. 65:47 Here's our starting point at the reservoir. 65:51 And previously, we were taking this red path, 65:54 which is the shortest path cipher function, 65:59 not taking into account total distance. 66:03 This time, we're going the other way around the reservoir, 66:10 getting on this path over here. 66:14 And we ended up at the same place. 66:22 Cool, that's fun. 66:26 So there are several other algorithms, 66:31 pathfinding algorithms. 66:33 The other one here that would be 66:35 interesting to explore is A-star. 66:42 So A-star is similar to Dijkstra, 66:48 in that it considers the, 66:53 it considers the paths with the lowest total cost first 66:58 as it's doing that depth for traversal. 67:03 The difference though, 67:04 is that rather than just taking the sum of the distance, 67:10 it also uses a heuristic function. 67:14 And in this case that heuristic function 67:19 is geospatial distance to the destination. 67:24 So basically, A-star is saying, okay, 67:29 am I getting closer to my destination? 67:32 And is it also the shortest distance 67:36 that I've walked thus far? 67:38 So it sort of factors in this heuristic function, 67:42 which can be any function really, 67:46 according to the algorithm, 67:48 but this implementation in GDS uses geospatial distance. 67:53 There is, I think a good Wikipedia 67:59 diagram that shows kind of the distance, 68:04 or the difference rather. 68:05 Yeah, here we go. 68:06 So here's the same sort of search 68:08 that we were looking at before. 68:12 Remember with Dijkstra, 68:13 we were just sort of expanding kind of aimlessly. 68:17 But here, were picking the routes that are 68:20 getting us closer to our destination. 68:22 So it finds the destination a lot faster. 68:30 I think I'll skip that one. 68:31 You can check out, 68:32 oh, by the way, I should drop a link to the docs 68:35 for this in the chat. 68:43 From the chat, wish I could encode in Cypher like you. 68:46 Don't make any mistakes, astonishing. 68:48 Oh no, we make plenty of mistakes. 68:52 Perhaps we're just having a good day. 68:57 Cool. 68:58 So that's the link that I pasted 69:04 to the A-star implementation in GDS. 69:06 From there you can get to, if you look in the sidebar, 69:11 shortest path for Dijkstra. 69:14 The difference here with A-star is, 69:21 we have to add the latitude and longitude 69:24 property keys as well, 69:26 so that it can figure out the distance to our destination 69:31 at each iteration. 69:36 Cool. 69:37 But we'll, so we'll leave it at that. 69:39 What we want to do now is 69:43 take this Dijkstra implementation, 69:48 and stick this in our GraphQL API. 69:51 Since we know this is going to give us much better results. 70:00 So let's, 70:04 What I'm going to do, is just copy this part, 70:13 and jump back into our code for our GraphQL API, 70:23 and here, where it says route to POI, 70:29 I am just going to replace our naive shortest path example. 70:43 And, fix the indentation there, there we go. 70:49 And let's modify this now, so that we'll have 70:54 Dijkstra GDS instead of Cypher shortest path. 70:58 So our starting node is this, 71:02 remember in these Cypher directive fields, 71:08 this is a special keyword that refers to 71:10 the object that we're currently resolving, 71:12 which in this case is the point of interest node. 71:16 And then the end node is going to be other. 71:21 So the point of interest that we're passing in the POI ID. 71:32 And then what we return, 71:36 so let's modify this, returning this object, 71:40 but we don't have N in scope anymore, 71:42 instead we're calling it node. 71:44 That's what we're getting back from 71:46 the call to GDS shortest path. 71:49 We're using GDS util as node 71:52 to go from a node ID to the actual node object, 71:56 so that we can read the 72:01 location dot latitude and longitude. 72:04 'Cause otherwise here at this point, 72:06 all we have is that internal ID. 72:08 So we need to go from that 72:10 to the actual node in the database. 72:17 Okay, I think that should do it. 72:22 So let's see if that works. 72:30 So I go back to our GraphQL API, 72:32 and it's going to be a little anticlimactic 72:36 if I run the same query, 72:39 now we can see, I just get slightly different results. 72:47 But, now we know that this is indeed the shortest path, 72:54 thanks to Graph Data Science and Dijkstra's algorithm. 73:00 And you looked at this on the map to confirm that yep, 73:03 that looks like, looks like a good path. 73:09 Okay, cool. 73:10 So now we've got that on a GraphQL API. 73:17 We haven't made any changes to our Gatsby site. 73:21 I think maybe, I think maybe we'll stop there 73:25 since the next step is to add the map here, 73:30 and then the detail map 73:35 for each one of our points of interests, 73:38 which is going to show the actual routing. 73:41 So essentially, we're going to want some, 73:46 some UI kind of like this, where we're showing like, 73:49 here's where you are, here's where you want to go, 73:52 and here's the path that 73:54 you're going to follow to get there. 73:57 So we'll save that for next time, I think. 74:05 If you want to learn more about using NeoMap 74:09 to visualize spacial data, and especially paths like this, 74:17 Estelle wrote a blog post. 74:21 I think I have the link somewhere. 74:26 Yeah, here we go. 74:29 Estelle wrote this great blog post, 74:31 drop this in the chat, 74:34 sort of showing how to do the same sort of 74:41 visualization of these shortest path functions. 74:46 She's using Cypher projections for the map. 74:52 So she also shows how to import data from open street map 74:56 using this OSM NX Python library, 75:03 which is pretty neat too. 75:07 But she also shows how to use a cipher projection 75:10 in the GDS library, which we didn't do. 75:13 We were using this more configuration-based method 75:19 for defining the sub graph 75:23 that we want to use for the algorithm. 75:24 So gonna see a different approach there. 75:26 That is, that's a great blog post. 75:30 Estelle also recently published a book on 75:36 graph algorithms and graph data science with Neo4j, 75:40 which I'll drop the link to that in the chat as well. 75:45 So this is really cool, it goes into, 75:49 goes into a lot of detail on lots of different graph 75:52 algorithms in Neo4j, even beyond that 75:54 into machine learning, graph embeddings. 75:57 So definitely check that out. 75:59 All the code is online as well. 76:02 So thanks Estelle for building NeoMap for the community, 76:06 and also for writing that book. 76:09 Really cool stuff. 76:12 Cool, well, that's, 76:14 that, I think, is it for me today. 76:18 I'll push up the code from today. 76:23 Really, I guess all we changed in the code 76:27 is this Cypher query in our GraphQL schema. 76:32 But I'll try to add the documentation 76:35 and screenshots and stuff too 76:36 for how to use NeoMap to do some of these visualizations. 76:41 Cool. 76:42 Well, thanks. 76:44 Thanks everyone for joining today, 76:45 and I hope to see you next week. 76:49 Remember, we do every Thursday at 2:00 p.m. Pacific 76:52 on the Neo4j live livestream, 76:56 talking about using GraphQL and GRANDstack with Neo4j. 77:02 And I will leave you with the link to the 77:08 Fullstack GraphQL book excerpt. 77:12 This is free to download, 77:13 it's the first three chapters of the book 77:15 that show how to use, or how to build these 77:20 backend GraphQL APIs, 77:22 so focused on the back end rather than the front end. 77:25 And once we finish this Gatsby project on the stream, 77:30 we will do a Fullstack GraphQL book club. 77:34 So stay tuned for that. 77:37 And don't forget to register for Nodes 2020, 77:44 the Neo4j online developer expo and summit, 77:47 online conference coming up on October 20th. 77:53 Cool. 77:54 So with that, thanks everyone. 77:56 And we will see you next time. 77:58 Cheers.
Subscribe To Will's Newsletter
Want to know when the next blog post or video is published? Subscribe now!