Data modeling and import
Building A Real Estate Search App With GRANDstack: Part 2
Will Lyon starts the second part of his series on building a real-estate search app. This week he is spending more time on data modeling and import, working with geospatial data.
Links And Resources#
- Hi folks. 00:09 welcome to the Neo4j twitch channel. 00:12 My name is Will. 00:14 And we will be working with GraphQL and GRANDstack. 00:20 Today every Thursday at 2pm 00:23 Pacific. 00:24 So last time, we were working on building a real estate 00:30 search app 00:31 with GRANDstack, 00:33 kind of like a Zillow clone. 00:37 And we're gonna pick that up today. 00:41 Last time we sort of got everything started 00:44 from the GRANDstack starter project. 00:48 And then we deployed 00:52 to netlify 00:54 and we started looking at some data modeling, 00:58 using 01:00 the arrows diagramming tool. 01:04 So I pushed up everything to GitHub. 01:07 I'll drop this link in the chat here. 01:14 There we go. 01:15 So if you wanna follow along, 01:16 this is the GitHub project. 01:19 And last time, we just use the GRANDstack starter project 01:24 to create 01:25 our sort of skeleton starter 01:29 with the GraphQL API, React application 01:33 on top of 01:35 Neo4j. 01:36 Last time, we used a Neo4j sandbox instance, to do this. 01:40 Here's sort of just a kind of a rough sketch 01:44 of what we're building. 01:45 So the front-end, this is a React application. 01:50 We use Apollo Client 01:53 to make data fetching queries to our GraphQL API, 01:57 our GraphQL API is Neo-js, 02:01 GraphQL server that's built with neo4j-graphql-js 02:04 and uses the neo4j-javascript-driver 02:07 to send queries to Neo4j. 02:09 Previously, we had this in Neo4j sandbox. 02:13 And then queries come back 02:15 data comes back through the GraphQL API 02:17 and to web front-end. 02:19 So that's the basic architecture. 02:21 This is the data model that we had sort of sketched out 02:28 last time, just kind of the initial 02:33 nodes and relationships there. 02:36 And I talked a little bit about how data modeling 02:41 is often driven by the requirements of our application. 02:45 So I think of this 02:46 as this iterative process that we go through, 02:49 where we start defining the requirements of our application. 02:53 And then we identify what are the entities 02:57 those are the nodes, how are they connected, 02:59 those are relationships. 03:00 And as we start to draw out our data model, 03:04 we start to 03:07 look at the questions that we wanna ask of the data. 03:09 So as a user, I wanna search for properties for sale 03:14 in a specific city so that I can view property details. 03:18 And I wanna at least think of 03:21 some traversal through the graph. 03:23 If I can write the cypher query for it, 03:25 that's great. 03:27 So that I know that yes, 03:28 this data model addresses the requirements 03:31 of my application, right. 03:34 So let's take a look 03:40 at an example of what we wanna build here. 03:42 So 03:43 this is Zillow. 03:44 It's a real estate search application. 03:47 We wanna look for properties 03:50 for sale, in this case 03:53 in San Mateo. 03:56 Right and so I can search by city 03:58 and view property listings, 04:00 I can view detailed information about a property 04:06 such as 04:09 its square footage, the number of bedrooms and so on. 04:15 Alright, so that's the basic idea for this 04:17 for this requirement. 04:17 So what I wanna do in today's session is 04:20 pick up where we left off with our graph data modeling. 04:24 I want to see 04:26 sort of 04:28 that we've addressed some of our use cases 04:30 with this data model. 04:31 And maybe we need to tweak this a little bit. 04:32 And then I'd like to get started with importing some data 04:39 into Neo4j, and then see if we can start querying it 04:43 with GraphQL. 04:45 So first, a little bit about data modeling. 04:49 So we're using this tool called arrows, 04:52 which is this graph data modeling graph diagramming tool. 04:57 It's hosted on the web. 04:59 It's very simple., 05:00 uses local storage to store our model, 05:02 which we can then export in various ways. 05:05 I mean, drop a link to that in the chat. 05:13 Cool, so one thing that I really like about arrows 05:16 is that we can check in this markup 05:20 that defines our data model into version control. 05:24 So here I've I've pushed the markup for my data model 05:28 up to GitHub. 05:32 So I can just copy this out of the readme. 05:36 And then if I go in arrows to export markup, 05:41 and paste in what I copied out from the readme, 05:46 that will load my data model, and then I can go in, 05:49 and start tweaking things. 05:54 Cool, so in this case, 05:56 let's look at what requirements we've identified 06:01 so far. 06:05 Let's clean this up a little bit. 06:08 So we said we have listings, 06:11 listings were created at 06:15 a certain time. 06:16 So in Neo4j we have a datetime object that we can use, 06:20 then we have this Boolean flag 06:23 is this listing active. 06:25 And we'll probably have other information on our listing 06:28 maybe such as like asking price, 06:33 which is going to be 06:35 an integer. 06:37 So this is a listing of 06:40 a property 06:42 and 06:44 maybe this property is 06:51 in 06:52 a specific city. 06:53 To reverse the direction with that. 06:58 Okay, and then our property has a 07:02 address, 07:03 it has a 07:05 location, 07:06 and then it has 07:08 bound. 07:09 So location is a single point. 07:11 So in this case, latitude and longitude. 07:14 And then also it has this bounds, 07:17 which is property, which is an array of points. 07:21 And this is what's going to be defining the geometry, 07:25 essentially a polygon of latitude and longitude points 07:30 that are gonna define the boundaries of the property parcel. 07:35 And why don't we have both of these? 07:38 Well, we have one latitude and longitude to represent 07:43 if we just wanna annotate a map and show the points 07:47 on a map. 07:50 Close down Zillow. 07:51 Let's take a look here. 07:54 So San Mateo, 08:00 choose this one. 08:03 Right, so we can see that, okay, 08:04 it's annotated on the map as 08:09 a single point. 08:12 But if I maybe view the details of the property, 08:16 and 08:19 if I view the map for the details of the property, 08:24 now I have the ability to view the lot lines, 08:28 and the lot lines are gonna show me. 08:30 how big in the dimensions of the actual lot 08:34 for the property. 08:35 So we keep track of 08:38 not only a single 08:42 latitude and longitude point 08:43 to represent the property 08:44 so that I can annotate it on the map 08:45 and so that I can use that in maybe search on the map, 08:51 something like that. 08:52 Then also the bounds 08:53 because I actually wanna do work with that polygon geometry. 08:56 Okay, so that's what I have so far. 08:58 Let's look at some of the requirements that we have here. 09:02 So second one we identified is, as a user, 09:06 I wanna limit my search to properties 09:08 with certain attributes, or range of attributes 09:12 so that I can narrow the results 09:14 to those relevant for me. 09:17 Okay, so what does that mean? 09:18 Well, if I look at this example, in Zillow, 09:22 let's close out of the property details. 09:27 I have, 09:29 let's say like a bed and bath filter, so maybe I want 09:34 at least two bedrooms, 09:36 I'd say at least four bedrooms and at least two bathrooms. 09:41 Right, and now this is changing my search results. 09:44 And there's a lot more I can filter by things 09:46 like home type, by things like price range, 09:50 and so on. 09:51 So let's take a look at our 09:54 graph model. 10:01 Now, last time when we were talking about this, 10:05 I made the distinction between a listing and a property. 10:11 And the reason for that is because 10:14 listing is sort of at a point in time, 10:19 I may have a listing of a property, maybe it sells. 10:22 And then two years later, 10:24 the people who bought that property, then 10:26 they list it and sell it. 10:28 And so it's really a listing of the same property 10:33 so I can have multiple listings per property. 10:35 And the question is, 10:38 okay, well, I have lots of detailed information 10:42 about the property and the listing things 10:46 like number of bedrooms, square footage, and so on. 10:49 And the question is, 10:51 okay, does that belong on the listing? 10:54 Or does that belong on the property? 10:56 Here we've started to add it to the listing. 10:59 So here, we have number of bedrooms as a property 11:03 on the listing. 11:05 And this is a good question this is 11:07 (clears throat) 11:08 sort of getting at this idea of entity versus state, right. 11:14 So the property is is kind of the entity. 11:17 And maybe when it's listed once, it has two bedrooms, 11:22 but then someone bought it and did a remodel, 11:25 and this time it has three bedrooms 11:27 or perhaps I know there's, there can be some discrepancies 11:31 and how things like square footage are calculated. 11:33 So maybe the next time that it's listed, 11:38 it's listed as maybe slightly different square footage 11:41 and so on. 11:42 So the way I like to think of this is the property 11:46 that represents the entity and the things that are never 11:50 going to change should be properties of the entity, 11:54 in this case, the property node. 11:56 So things like address, 11:58 things like the location latitude and longitude. 12:01 Those are not going to change, 12:03 but the detailed information of the listing 12:06 that may change, based on sort of changes through time. 12:10 So that is sort of the state of the entity. 12:14 So anyway, that's how I like to think of that. 12:17 But our question was, 12:18 can we sort of envision a graph traversal 12:22 where we are searching now for properties in San Mateo, 12:28 that have at least two bedrooms and four bathrooms 12:32 or four bedrooms, two bathrooms, whatever. 12:34 So, yeah, if I have bedrooms, I can also add things like 12:44 bathrooms and so on. 12:45 I can add things like 12:50 square footage, 12:55 and so on. 12:56 So 12:57 now when I'm searching, 12:59 I just have a predicate that is gonna specify the 13:05 number of 13:07 bedrooms, just list four, bathrooms, list two and so on. 13:10 Okay, so this, this requirement, 13:12 I think is addressed by our data model. 13:18 And then the next requirement that we identified last time 13:20 is as a user searching for properties, 13:22 I wanna view property details, 13:24 so I can learn more about the listing. 13:27 Yeah, and again, I think that's, 13:29 that's addressed in our data model. 13:31 Since we have, the property details information 13:34 in the listing, 13:36 we can think of things you might wanna add here. 13:38 And when we get to adding the functionality 13:41 of creating a listing for a property, 13:44 we'll see maybe how to address some of the specifics. 13:46 But sure, we can see that we just sort of need to return 13:50 the properties of this node to be able to show 13:54 the user some detailed information about the property cool. 14:01 Oops, there we go. 14:02 Okay, so that that's sort of addressing the requirements 14:09 with our data model. 14:13 Let me check the chat here just to make sure we don't 14:16 not missing any questions. 14:18 Again, feel free to reach out in the chat 14:19 if you have any questions or thoughts. 14:27 Okay, so that's 14:28 that I think is a good 14:31 initial discussion of graph data modeling. 14:35 Again, this is an iterative process. 14:38 As we discover more requirements, 14:41 we may need to update our data model 14:44 and that may then inform some of the different queries 14:47 that we run. 14:49 So the next thing I want to explore today is starting to 14:54 import some data into Neo4j 14:57 and start to sort of see how we can use GraphQL 15:01 to start to query Neo4j. 15:04 And I thought it would be interesting 15:06 to start to look at 15:09 how we can start to import this property 15:12 and in the city node as well. 15:16 So if we, 15:18 again if we jump back to 15:21 our example, in Zillow here where we're searching 15:26 in San Mateo. 15:29 And remember, if we zoom in 15:32 really far, we can see that 15:34 well, we have lots of information 15:37 about all of these properties 15:41 within San Mateo, 15:42 both, you know, we know 15:46 even if they're not for sale, 15:48 the bounds of the property we know the square footage, 15:53 we can see kind of a Google street view 15:56 we have an estimate of how much the house is worth. 15:58 We have 16:00 things like 16:02 property taxes and so on. 16:04 And 16:06 this data in the US anyway is, for the most part, 16:10 public data. 16:13 Information on parcels and property taxes 16:17 and size of the house and so on. 16:20 This is typically public information that is available 16:24 at the the state or county level in the US. 16:26 I'm not sure how this works in Europe. 16:28 But anyway, I thought it would be interesting 16:31 to start to import some of this data into Neo4j, 16:34 and see how we can build a GraphQL API on top of that. 16:38 So 16:40 I 16:41 found 16:43 for 16:45 the state of Montana. 16:48 Some of this data so I was working on 16:52 an unrelated project and came across that 16:54 so we'll start here, 16:56 importing some data, some parcel data 16:59 from the state to Montana. 17:02 This is their, their FTP server, 17:05 which I'll post a link to in the readme 17:08 and also add all the other steps we go through 17:10 to import this. 17:13 So what we wanna do, 17:14 let's pick, start with Gallatin County. 17:19 And we can see we have sort of an XML file that tells us 17:23 what the data looks like. 17:26 And then we have a few different versions 17:30 of the data we can download. 17:32 One is 17:34 GDB file, which I think is some sort of 17:39 geospatial database format, 17:41 that one I'm not too familiar with. 17:44 And but the other 17:46 two formats are shape files. 17:49 So I'm going to 17:53 copy the link address for this one. 17:57 And let's jump into a terminal. 17:59 So this is... 18:00 Keep this a bit bigger. 18:02 This is 18:08 here we go 18:10 beautiful. 18:11 So here I am 18:13 in the willow GRANDstack directory. 18:18 Let's create 18:21 a data directory 18:25 and let's 18:28 hold down this zip file 18:35 (ta-ta-ta-ta) 18:47 (humming) 18:51 for some reason that didn't work 18:52 can you not wget zip file 18:58 that's fine still 19:02 Save As. 19:10 Okay, let's save this to 19:23 have it in, I think I have it in my home folder. 19:26 Just sitting there Willow GRANDstack in data. 19:35 Okay, 19:36 cool. 19:41 So let's unzip that. 19:46 Okay, and so we have, we have some shape files. 19:50 Now with that has parcel information 19:53 for Gallatin County, 19:56 Montana. 19:58 What we wanna do now is load this data into Neo4j. 20:05 How do we do that? 20:05 Well, if we take a look at 20:08 at the shape file, 20:13 we can see that this is some... 20:14 This is a binary format. 20:17 So there is 20:18 a plugin that allows us to 20:22 load shape files into Neo4j. 20:25 There's a spatial extension. 20:27 But what I'm gonna do today 20:29 is convert the shape file into GeoJSON. 20:33 So GeoJSON is 20:35 just 20:37 JSON 20:39 for a specific schema. 20:42 So it has specific structure for storing things 20:47 like 20:48 geometries 20:50 and metadata about those geometries. 20:53 And GeoJSON is nice because once we have that, 20:57 it's first of all, it's basically human readable, 21:00 we can look at a JSON file 21:01 and sort of understand what data is there. 21:04 But then also, there's a whole suite 21:08 of tooling that we can use for importing JSON into Neo4j 21:12 for transforming it, and so on. 21:17 So I'm going to use 21:21 a tool that's part of 21:24 GDAL. 21:25 Let's take a look at that. 21:29 Close that notification, I'll 21:32 drop this link in the chat too. 21:42 So GDAL is part of this 21:45 suite of command line geospatial tools. 21:50 This specifically is useful for 21:54 transforming geospatial data files into different formats. 22:02 I installed this. 22:04 This is a Python command line tool. 22:07 I think I installed this 22:10 through Anaconda. 22:11 There's different ways to install these tools. 22:14 Specifically, the tool we want to use is 22:20 ogr2ogr. 22:28 So this is a command line tool that 22:30 is going to allow us to convert 22:32 from the binary shape files that we downloaded 22:36 into 22:38 GeoJSON. 22:41 So let's go ahead and do that. 22:45 So what I wanna do is 22:49 ogr2ogr. 22:52 I want to generate this file called Gallatin.geo.json. 23:00 This -F. 23:02 This is just saying what formats do I want to 23:08 output, right? 23:09 So I could go from GeoJSON to a shape file, 23:15 or lots of other different formats. 23:17 But in this case, I wanna output GeoJSON. 23:22 This LCO. 23:23 So this is 23:25 an option that's gonna allow me to specify 23:28 what is the ID field 23:32 in the GeoJSON. 23:33 In this case, I just happen to know 23:36 that we have a field parcel ID for each parcel, 23:41 and we wanna treat that as the ID field in our GeoJSON. 23:49 Okay, this next option, let's look at the docs for this 23:53 is T underscore 23:56 s r s. 24:00 So this is saying okay which 24:05 which transformation do you want to use, 24:08 which coordinate reference system do you want to use 24:13 and 24:14 I forget which the 24:18 shape file is using, 24:21 but we want to specifically use 24:27 WGS 84 projection, 24:30 also known as the epsg 4236 projection. 24:35 This is, I think commonly called the geographic 24:41 reference. 24:43 That's what we want to use there. 24:45 And then the file we want to pull in is 24:49 the shape file that we just downloaded. 24:53 So let's run this. 24:55 This will take 24:57 a minute or so 25:00 (chainsaw roaring) 25:08 If you hear a bit of background noise, 25:11 my neighbors are getting some trees cut down. 25:14 So apologize for that. 25:16 That's just some chainsaw noise in the background. 25:22 Okay, so now we should have a Gallatin, 25:26 that GeoJSON shape file, 25:29 oh sorry, GeoJSON file. 25:32 So here we're just looking at the first few lines of it. 25:36 So here's just sort of some metadata. 25:41 Now we have an array with the key features. 25:46 And then for each one of these features, 25:49 we can see that we have 25:52 first of all telling us the type feature okay, great. 25:56 We have an ID. 25:57 This is where we specified parcel ID 25:59 from the shape file up here. 26:02 And then we have 26:05 an object properties 26:07 for each one of these features. 26:09 And within properties, we have things like the 26:15 GIS acres. 26:16 So I guess that's the the acreage 26:18 for this parcel that we're looking at. 26:21 We have some information about property taxes in this case. 26:27 So for the 2020 tax year, 26:30 that we have the assessment code, 26:34 we have things like the address, 26:37 how big is it? 26:40 And so on. 26:42 I think we, 26:43 for some of these, we even have information about 26:45 number of bedrooms, 26:47 and so on things like that. 26:52 Cool, and then we have 27:00 a Geometry property. 27:03 Let's take a look at that. 27:05 So here's our geometry. 27:08 This is of type polygon. 27:11 And here are the coordinates. 27:15 So this is a list of latitude and longitudes pairs. 27:23 So this is what forms the polygon of, 27:26 of the parcel. 27:31 Cool, so we're getting on time here. 27:35 Okay, yeah, plenty of time. 27:37 Great, okay, we have this GeoJSON file. 27:41 Now we want to start importing this into Neo4j. 27:47 So let's see how we do that. 27:49 Last time, we were using Neo4j sandbox. 27:54 That's just a really easy way to get started 27:56 and have a hosted instance of Neo4j running in the cloud. 28:01 This time I'm going to use Neo4j desktop. 28:06 Neo4j desktop allows us to run Neo4j instances locally, 28:10 think of it as sort of a delivery mechanism 28:13 for Neo4j database. 28:16 If you haven't used Neo4j desktop before, 28:22 you can get Neo4j.com/ download. 28:28 Drop that link in the chat 28:35 and close some of these windows here. 28:39 Okay, so 28:41 let's create a new 28:45 project 28:49 in desktop 28:57 let's call it 29:00 Willow 29:02 GRANDstack. 29:05 And we'll create a new database. 29:10 I wanna create a local database, 29:12 I'm gonna choose 29:15 version 3518. 29:17 The reason I'm choosing this version instead of, 29:22 one of the newer 4.0 instances 29:24 is because 29:27 there's a specific plugin 29:29 that I wanna use later on. 29:32 That I know works with this version. 29:35 Not sure about 29:39 the 4.0 version. 29:41 We give it a name and then a password 29:45 can be anything we just have to remember 29:47 what that password is later on. 29:54 Okay, so 29:56 we have some JSON data, 29:58 how do we import that into 30:01 Neo4j. 30:03 Well, there is a 30:11 there's a library for Neo4j called APOC, 30:13 which stands for Awesome Procedures On Cypher. 30:19 And APOC you can think of as 30:21 kind of like the standard library for Neo4j. 30:26 It has a ton of 30:31 tools for things like data import and exports, 30:34 things for 30:37 formatting data, 30:39 things for 30:41 virtual nodes, things like that. 30:44 It exposes more complex path expansions and so on. 30:48 So specifically, we're interested in the data import 30:53 functionality in APOC 30:55 and we can see here that we have a load JSON procedure. 31:02 So cypher itself, you may have used LOAD CSV, 31:06 which is built into cypher 31:07 so we can stream CSV files directly in Neo4j 31:11 and with cypher specify how we wanna create that data 31:15 in the graph. 31:16 And APOC load JSON gives us similar functionality 31:19 but with JSON files. 31:26 Cool, so I'm going to 31:31 go in here to manage 31:33 plugins 31:35 and install APOC. 31:40 So that will install the APOC library 31:44 in to this Willow GRANDstack database 31:48 that I just created and desktop. 31:51 And then because I want to load a file, 31:56 locally, I wanna be able I want APOC to be able to access 32:00 files in my file system, 32:02 I need to add 32:04 APOC. 32:05 dot 32:08 import.file 32:11 enabled equals true. 32:13 I think it is. 32:21 APOC 32:24 dot import 32:27 dot file enabled true. 32:28 Yeah, so in order for APOC to load access files 32:33 on the file system imports, 32:35 I need to add that in my Neo4j settings. 32:40 Cool got that. 32:41 So let's hit apply to save that. 32:46 And then we'll go ahead and start Neo4j. 32:55 Okay, there's one more thing while this is starting, 32:56 there's one more thing we need to do. 33:04 If we look at 33:07 the geometries, we can see that we have polygon geometries. 33:14 But let's do 33:17 a grep for 33:20 multi polygon. 33:23 And we can see that we also have 33:26 lots of multi polygon. 33:30 (mumbles) 33:31 Okay, multi polygon. 33:32 So the difference between polygon and multi polygon 33:36 is polygon is sort of 33:39 one single 33:42 set of coordinates that define a polygon. 33:45 But a multi polygon can have 33:49 polygons, maybe within the polygon. 33:51 So maybe this is the boundary of a parcel, 33:54 but there's a piece that's cut out 33:56 or they're two disjoint polygons. 33:59 And what I want to do is filter out 34:03 any of the parcels that have multi polygon. 34:07 Because the way I wanna deal with polygons 34:09 as a list of points is 34:13 a bit too simplistic for handling multi polygons initially. 34:17 But actually, I think maybe we may not even get 34:20 to working with polygon geometries today. 34:23 So we'll save that for next time. 34:27 So in that case, I'm ready to 34:32 grab this GeoJSON file. 34:33 So let's go here into desktop. 34:38 Let's open a terminal. 34:41 So one thing that's nice about desktop is 34:43 I can open a terminal 34:47 and because desktop manages a bunch of different 34:52 installations of Neo4j database, 34:55 it gives me the ability to open the terminal 34:59 with all of the. 35:01 All the binaries loaded into the path. 35:05 So I can do things like Cypher Shell, or Neo4j admin. 35:09 And they're on the path specific to this installation 35:12 of this Neo4j DBMS that I wanna work with, which is nice. 35:18 I mean, you can see if we look at 35:20 where this is, this is buried somewhere 35:22 within the Neo4j desktop installation, 35:26 but I don't need to worry about that. 35:30 So if we look, we can see we have some few directories 35:33 set up for us. 35:35 There's a data directory that's where the actual data store 35:39 for Neo4j data is written. 35:42 There is a conf directory, 35:42 that's where my settings are written, 35:44 there's logs and so on. 35:47 What I want is to go into the import directory. 35:53 By default, this is where files 35:57 will be read from for things like APOC Load JSON, 36:02 load CSV and so on. 36:04 And 36:06 what I wanna do is copy 36:12 not from 36:14 that. 36:17 I want it from 36:26 Willow GRANDstack data. 36:31 gallatin geoJSON. 36:31 So I want to copy the gallatin geoJSON file 36:36 that we created now into this import directory 36:39 so that I can reference it with APOC Load JSON. 36:43 Okay, cool. 36:44 So let's fire up. 36:46 Our Neo4j instance is running. 36:47 Let's go ahead and open it up with Neo4j browser 36:51 opens over here. 36:54 Oops. 37:00 There we go. 37:05 Cool, so this is a fresh installation of Neo4j. 37:09 There's nothing here. 37:11 Verify that match n return, count n, 37:18 I think here, 37:19 but let's import some of our property data. 37:23 So APOC, load JSON. 37:28 And it is 37:31 Gallatin dot 37:34 GeoJSON. 37:36 And so by default, 37:37 this root 37:40 path is going to refer to within that 37:43 import directory for this installation, 37:45 which is why we move that file over 37:48 and then we want to 37:51 yield value. 37:52 But remember this the structure of this GeoJSON file. 37:58 Let's go back and look at that. 38:02 Gallatin 38:04 With such that 38:07 all of the data that we care about is jammed 38:09 under this features key. 38:13 So I don't want to just return sort of this whole JSON file, 38:20 that's gonna be and have many hundreds of thousands 38:23 of rows. 38:25 Instead, just to take a look at this, 38:29 let's do an unwind over value dot features 38:35 over value 38:36 dot features 38:39 feat 38:42 and return the first 10. 38:44 So how does 38:45 how does this APOC load JSON work? 38:47 Well, it parses this JSON file, 38:50 and then yields an object. 38:53 This is, like a map, like a dictionary, 38:57 which I can then 38:59 work with in cypher to specify how I wanna create 39:03 that data in Neo4j. 39:05 So here, we're saying for the 39:09 features array. 39:11 So for this giant array of features, 39:15 I wanna iterate through that. 39:18 And I wanna alias. 39:19 Now each one of these objects, 39:22 each one of these dictionaries or maps, 39:24 I'll use those interchangeably for our purposes. 39:29 And then let's just return the first 10. 39:32 So let's see. 39:34 Let's see if this works. 39:36 So this is parsing our JSON file. 39:41 And then it's going to iterate over 39:43 that features array, and hopefully should return 39:47 just the first 10 for us. 39:50 Cool, and so here's, here's the first one. 39:53 So it returns back 39:57 this object that has a geometry Key 40:00 with polygon as the type and a bunch of coordinates. 40:06 And then we have 40:08 parcel ID. 40:10 And then we have a bunch of property 40:14 information. 40:17 Cool. 40:18 So this is the information we have for one parcel, 40:23 one property. 40:24 One thing that's nice about load JSON 40:29 with APOC 40:30 versus load CSV with cypher. 40:34 If you've used if you've used load CSV before, 40:36 you may have noticed that by default, 40:39 there's no sort of interpolation of the type, 40:43 the type of our data that we're reading in 40:45 so everything is by default treated as a string, 40:48 and we kind of need to cast that specifically. 40:51 But if we notice here, we've interpreted the type 40:57 of the objects here. 40:58 So property ID, 41:01 this is an integer. 41:03 This is a string, 41:05 total value, 41:06 this is an integer. 41:10 Shape, length is a float and so on. 41:13 So that's one nice thing we get 41:17 with a APOC load JSON. 41:20 Okay, well, that's fine. 41:21 We need to do more than just sort of look at these in Neo4j. 41:28 In the browser, let's actually create some data. 41:30 So this is just parsing that JSON file 41:33 and then just returning it to the browser, 41:35 we haven't actually created any data in the database yet. 41:40 So let's now 41:44 do a for each. 41:46 So again, we wanna iterate over. 41:53 We wanna iterate over 41:56 all of our 41:58 features. 42:00 So we'll say for each feat in value dot features, 42:07 and by the way, 42:09 I like to use the cypher refcard. 42:13 If you haven't seen this, 42:17 drop a note into that in the chat 42:19 cypher refcard is 42:23 just sort of quick reference documentation, 42:26 for lots of the common cypher 42:31 functionality, 42:32 so let's search for, foreach. 42:37 So, foreach, allows us to, 42:39 as it says, execute a mutating operation 42:44 for reach relationship in a path 42:45 for each element in a list 42:47 is what we wanna do. 42:48 So this is the syntax for each 42:53 the alias that you want to use in whatever list 42:58 and then a pipe and then whatever you're mutating operation 43:03 is going to be. 43:04 So in our case, we want to 43:07 execute a create operation creating a property node 43:14 for each 43:16 element in that features 43:20 array. 43:23 Cool. 43:28 So what is that going to look like? 43:32 Well, something 43:34 like create 43:40 property. 43:42 So this will create the node. 43:44 And then we want to set some values. 43:48 Or if we look back, 43:51 I close the 43:55 tab. 43:56 Let's run this 43:58 cypher query again. 44:02 To show us the 44:06 first 10. 44:07 Well, let's just look at the file anyway. 44:09 So 44:10 for each feature, we have an ID. 44:15 And then we have a properties 44:19 object, 44:20 which goes through here and then our geometry object 44:24 is separate from that. 44:26 So let's go ahead and set 44:31 oops, 44:39 unwind, 44:41 oops 44:42 must have cleared out what we had so far, that's fine. 44:44 So we said it was 44:46 foreach feat in value dot features 44:51 and then a pipe and then 44:54 create p colon property 44:57 and then in curly braces, let's set 45:03 ID to feat.id. 45:09 And then because we have this, 45:12 this map of key value pair objects anyway, 45:16 we can just say set p plus equals feats dot properties. 45:23 And what this will do is 45:25 Take this properties object 45:29 and just add these as key value pair properties on this node 45:35 that we just created. 45:39 Cool. 45:40 So 45:43 let's run this and see what happens. 45:50 So this should be iterating over all of these features 45:54 in our JSON file, creating a node for each one, 45:58 setting the ID property and then setting all of 46:02 the key value pairs 46:03 in the properties object has properties on the note. 46:10 Let's(mumbles) check the chat see if we have any questions 46:13 look like it. 46:20 Okay, so once this is done, 46:23 then the next step is to see how we can 46:28 expose some of this data through our GraphQL API, 46:33 and then sort of see how we can maybe do some, 46:35 some sort of filtering and so on. 46:38 Okay, so this created, let's say 50,000 nodes. 46:42 Okay, so 46:44 let's 46:47 match 46:50 property 46:51 let's look at the first 10 of these. 46:57 Okay, so here's one, 46:58 we can see 47:03 some information about this, 47:07 such as 47:10 the type of property, 47:12 the total acres, 47:14 what city it's in, 47:17 and so on. 47:19 Cool. 47:22 Okay, so there is obviously not all of the data 47:25 that we need. 47:26 And we need to flush out our data model a bit more. 47:32 If we look at 47:36 our data model in arrows, 47:39 there are some things in here that we might be able 47:41 to extract, we might be able to 47:43 maybe pull out the city, 47:44 pull out some of 47:46 bedroom, bathroom, square footage, and so on. 47:49 But what I wanna do 47:52 in the next few minutes is see, 47:55 okay, now that we actually have some data in Neo4j, 47:57 how do we then start to change our GraphQL schema, 48:03 and expose the GraphQL API and start to query this data 48:07 in Neo4j. 48:08 So let's see if we can do that. 48:10 And again, let's run a call db.schema.visualization. 48:17 Right, so far, all we have is property nodes, 48:21 so not very graphy at this point. 48:24 But that's okay. 48:25 Let's just make sure we can 48:26 query some of this data using GraphQL. 48:30 Okay, so 48:32 this is now back in our Willow GRANDstack project. 48:36 This is exactly the code, 48:38 that's up on GitHub. 48:41 And if I do 48:45 npm, run start. 48:48 This will start both the 48:52 GraphQL server. 48:55 And remember, we also had this skeleton react application. 48:59 So the starter project that we started from had data 49:04 for a sort of business reviews application. 49:08 And what we wanted to do now is sort of start to change that 49:13 to reflect our 49:16 real estate search application. 49:21 So 49:22 let's jump over, it says our GraphQL API is running at 49:26 localhost 4001/GraphQL. 49:32 So this will give us GraphQL playground 49:33 which we can use to query 49:38 our GraphQL API. 49:40 And if we take a look at the docs, yep, 49:42 you can see our schema has things like users and businesses 49:45 and so on. 49:46 So let's, let's change that. 49:48 And instead, 49:50 let's set this up so that we're able to query 49:55 data based on our property, 49:58 know that we have in Neo4j. 50:00 So we wanna change our GraphQL schema reflect that. 50:03 The first thing that we wanna do 50:06 and we'll notice we have some errors here 50:08 saying that we're unable to connect to Neo4j. 50:12 And yep, that is reasonable 50:15 last time, we were using a Neo4j sandbox instance. 50:18 So if we look in the /API.envfile, 50:25 we have Have the connection credentials to our 50:29 Neo4j sandbox instance, 50:30 which I think I have since terminated, 50:32 and moved on to other things in sandbox. 50:35 So let's change this now. 50:37 change our Neo4j URI connection string 50:41 to point to localhost:7687. 50:46 That is 50:49 the default for local databases 50:52 that we created and started using Neo4j desktop. 50:54 So this is gonna refer to that database 50:57 that I have running in desktop, 51:00 the username, we left that is Neo4j 51:02 and then the password I set as 51:04 let me in. 51:08 Okay, and then 51:09 if we look in API source, 51:16 schema.GraphQL. 51:18 So these are GraphQL type definitions. 51:20 This is what's driving our GraphQL API. 51:27 And instead of users and businesses and so on, 51:32 we want to, 51:34 we want this to reflect the data that we have in Neo4j. 51:38 So I'm gonna stop our 51:41 GraphQL API 51:41 since we're gonna make some changes here. 51:44 One thing that's nice about the Neo4j graphical integration 51:48 is that we have the ability to infer 51:51 these type definitions from an existing Neo4j database. 51:56 Let's check out the docs on this. 51:59 So GRANDstack.io 52:02 /docs. 52:05 And I want 52:08 how to use infer schema sounds like what we want. 52:10 So if we already have an existing Neo4j database, 52:16 this infer schema function can be used to generate 52:21 the equivalent GraphQL type definitions 52:23 that describe the property graph model that exists 52:27 in that database. 52:29 So for example, if we have data about movies 52:32 and actors and users that have rated those movies 52:36 this may look familiar. 52:38 This is the the data model for the recommendations 52:41 Neo4j sandbox. 52:43 But if we run infer schema on that database, 52:47 it automatically generates for us the 52:53 GraphQL type definitions that we see here, 52:56 which is really nice, 52:57 because then you don't have to write these by hand, 52:58 we can keep these in sync. 53:02 Cool so a nice thing about the 53:07 GRANDstack starter project 53:08 is that it comes with an NPM script. 53:12 So if we look 53:14 in the root package, JSON, 53:17 there's an NPM script 53:20 for infer schema, colon, write 53:24 and that just runs the scripts infer schema, 53:27 let's take a look at what that is. 53:31 So this infer schema script is just going to run 53:38 the GRANDstack 53:41 cli tool 53:43 and pass some flags in there. 53:45 So the we're saying infer schema, 53:48 then it's reading from my .envfile, 53:53 the 53:54 environment variables that are being set for the connection 53:59 to Neo4j. 54:00 And then it's saying 54:01 write to this schema file API source schema dot GraphQL. 54:08 Okay, so what's going on there? 54:09 So, if you remember in the last video, 54:13 when we were talking about the GRANDstack cli tool, 54:17 we said that it something that got installed 54:20 as part of the GRANDstack starter, 54:22 and we said it it had some sort of 54:25 more advanced functionality 54:27 that we might use later. 54:29 And inferring a schema is, 54:31 is one of those things. 54:35 And this NPM script just sort of 54:38 gives us a nice wrapper on top of that. 54:40 So we should be able to 54:42 do NPM run 54:44 in first schema, colon write. 54:48 And because we've updated our.envfile, 54:50 it should now be able to connect to our 54:54 local Neo4j database running in, 54:58 in Neo4j desktop, 55:00 generate the inferred GraphQL type definitions based on 55:06 the data that we have and then update our schema.GraphQL 55:12 file, 55:13 which looks like it did cool. 55:14 So this is the data 55:17 that we have. 55:18 So we have a single node that has the label property. 55:23 every node has underscore ID, 55:27 the underscore ID 55:28 and the ID 55:31 fields are a bit different. 55:33 So 55:34 the Neo4j GraphQL 55:38 library adds this underscore ID fields 55:42 to every type in GraphQL. 55:44 And this maps to the internal node ID 55:49 in Neo4j, which you typically 55:52 don't wanna use in most cases. 55:54 That's sort of an internal implementation detail. 55:57 However, it can be useful in some cases. 56:01 But don't confuse that underscore ID with properties 56:05 that we've set explicitly, such as the ID field. 56:13 Cool, okay, so we can see also what's interesting is that 56:18 we have some fields that exist on every instance 56:23 on every one of those properties nodes, 56:26 and those are identified 56:30 with a bang. 56:32 So this is the continuous field, 56:36 which I'm assuming this is 56:38 maybe like continuous square footage or something, 56:41 we'll have to dig in to see what this data actually means. 56:45 And it's of type float bang, 56:47 which means this is a non nullable, float or required field. 56:51 So every, every one of those nodes contained a value 56:56 for this property, 56:57 but then some things like certificate, city, statez, 57:02 these are optional. 57:03 And because our infer schema process added these 57:07 as optional, 57:09 well, that means that there are some nodes in the database 57:12 that don't have values for that. 57:14 So we'll have to think about in our application, 57:16 how to deal with with cases 57:23 where we may not know 57:26 value for that property 57:27 you know have a value for that property in the data. 57:28 So I have to think for how to account for that 57:30 in our application. 57:31 But okay, so let's, let's fire up 57:36 NPM, run start 57:37 fire up our GraphQL API, 57:40 this also start the react web server. 57:44 I'm gonna close that though, just because 57:49 we 57:50 aren't quite ready to work with our react app yet. 57:53 Since it's still set up for working with a 57:56 somewhat different data model. 57:58 But okay, now we can see GraphQL playgrounds, 58:02 what sort of graphical API was generated 58:06 based just on our type definitions. 58:11 And so we can see that we have one query entry points 58:15 for property and then we have mutations 58:18 if you wanna create update, 58:20 merge or delete some of these. 58:23 So 58:24 let's start heading off here. 58:26 Let's make this a bit 58:28 bigger. 58:29 There we go. 58:30 Cool, so property, let's look at the first 58:36 hundred or so. 58:38 And 58:40 let's close the docks here. 58:41 Let's return. 58:46 ID we know everyone has an ID, 58:49 and then 58:50 address we have a few different 58:54 address properties. 58:55 So this should return. 58:57 Yep. 58:58 Cool, so you can see some properties don't have an address. 59:04 Actually, quite a few don't, 59:05 but everyone has, 59:07 has an ID. 59:13 Let's take a look at 59:16 what else do we have. 59:17 We have things like 59:19 assessment. 59:23 We also have things like total value. 59:30 Okay, so we can see that we're querying some data 59:34 from our Neo4j database. 59:36 If we jump back, 59:38 we can take a look at the generated cypher queries. 59:44 So what's going on here is when we run this GraphQL query, 59:48 so it says find the first 100 59:54 nodes with the label property 59:55 and then return some of these values. 60:00 These fields is called the selection set. 60:01 So ID, address, assessment and total value. 60:07 That GraphQL query is then translated into this cypher query 60:10 that actually runs against Neo4j and returns our data. 60:14 Okay, so that's pretty cool. 60:17 Let's maybe 60:21 do something a bit more interesting. 60:24 Let's 60:26 order this by 60:29 maybe 60:31 total value 60:34 descending. 60:35 So what is this, this is gonna tell us 60:38 the 60:39 most expensive property in that county. 60:45 So that is 915 Highland Boulevard. 60:49 And the assessed value is 60:52 112 million dollars, 60:55 I guess is what that means. 60:57 Okay, cool. 60:58 So now we can start to see how we can use some of our 61:04 features that we get in GraphQL. 61:06 to map to some of the business requirements 61:10 of our application, 61:12 I should say specifically, some of the features we get with 61:14 the new update GraphQL integration, 61:16 right this this sort of query generation, 61:19 is automatic ordering. 61:21 We can also do filtering, 61:24 which is neat. 61:25 So do we have anything on 61:28 square footage? 61:32 Let's see. 61:36 We have total acres 61:41 34. 61:42 So maybe, maybe we want to, 61:46 let's say filter now. 61:55 Where total acres 62:03 has to be 62:05 that maybe at least 10. 62:09 So one thing that's nice about 62:11 the new geographical integration is that 62:13 from just our type definitions that we defined here, 62:17 is the generated API includes 62:21 these ordering and filtering arguments. 62:25 So now we're filtering for 62:28 parcels that have at least 10 acres 62:32 ordering by their total assessed value. 62:38 Cool, so we can start to see how 62:41 we get a lot for free with the Neo4j graphical integration. 62:45 That's super helpful. 62:47 That's gonna be really nice for us 62:49 when we start to 62:52 when we start to look at how we can search for properties 62:56 in the front-end. 62:59 I have a few more minutes 63:01 before I have to leave and jump on another call. 63:04 But let's let's take a look at one more 63:08 cool thing we can do here. 63:09 So if we have, 63:13 if we have the Zillow 63:16 open somewhere, here we go. 63:19 So one thing that we wanted to add to our application 63:25 it's sort of an estimated home value feature. 63:29 Zillow calls it the zestimate. 63:32 So, for any house, 63:35 it's sort of an estimate of how much 63:37 this house is worth. 63:39 And there's 63:42 the zestimate one is 63:45 quite complex. 63:47 I'm sure there's a lot that goes into it. 63:52 We will, 63:55 we will have to look at sort of a 63:59 more complex way to go about this. 64:01 So for example, Zillow has this idea 64:04 of comparables showing us similar homes 64:07 and what they've sold for. 64:09 But one thing we can do, that is maybe a 64:16 initial way to look at this. 64:21 This estimated sales value is 64:25 well, we have this total value, 64:27 this is sort of the, the tax assessed value. 64:31 And if we had information about 64:35 the sort of percentage difference 64:38 that houses in a certain market, 64:40 we're selling over the total value 64:42 I know, in the US anyway, and in most cities, 64:47 in most counties, 64:48 they're sort of an assessed value, 64:49 which may be something like $100,000. 64:52 But on average, homes, net county sell 64:55 for maybe like 20% above the assessed value. 65:00 So if we have some simple model like that, 65:03 we can then add a computed field to our GraphQL schema 65:09 that's going to be defined with a cypher query in this case, 65:14 to determine what is the estimated sales price 65:22 based on just multiplying that total value 65:25 net assessed value by some amount. 65:28 So let's add that really quick. 65:31 We'll just call this 65:34 estimated sales price maybe. 65:38 And it's gonna be an integer. 65:40 And we're going to use the cipher schema directive. 65:47 So what is the cipher schema directive? 65:50 If we jump back to our documentation, 65:55 look at schema directives. 65:58 So schema directive is a way to annotate 66:04 fields or a type definition in your GraphQL type definitions 66:08 that indicates that there should be some custom logic 66:11 going on. 66:13 And there's a few that we use 66:15 in the Neo4j GraphQL integration, 66:17 Cypher specifically allows us to define 66:20 custom logic using cypher 66:24 so allows us to define computed fields 66:28 in this case, we're defining a computed scalar field. 66:35 So we're automatically injected a this 66:43 object. 66:46 So 66:47 this is going to be a node. 66:51 So if we just made this 66:54 node dot total value, 66:57 times 1.2. 66:59 So maybe our model says that 67:03 houses in our market are selling for on average 20% above 67:09 the assessed tax value, 67:13 multiply our assessed value by 1.2. 67:16 And we have the estimated sales price. 67:19 Cool. 67:20 So let's 67:22 see if we can pick that up. 67:25 Now we have the estimated sales price. 67:29 Let's add that 67:32 it cannot represent non integer value. 67:36 Oh right, because we're multiplying by a fraction here. 67:42 That is a float. 67:47 Let's, cast this to an integer. 67:50 So we'll say return to integer. 67:57 This dot total value times 1.2. 68:02 Cool. 68:03 Let's try that again. 68:06 Okay, server was starting. 68:08 There we go. 68:10 Cool, so now we have a total value. 68:13 So again, we're ordering by total value descending. 68:18 So these are the most expensive properties. 68:20 So this is total value of 112 million. 68:23 estimated sales price is two percent higher 68:27 134 million, and so on. 68:32 Cool. 68:33 So I think that is pretty good progress for today. 68:40 So just to recap, 68:42 things that we touched on, 68:44 we did a little bit of talking through our data model, 68:50 and verifying that 68:53 our requirements 68:55 can at least the ones we've identified so far, 68:58 can be met using the data model 69:00 that we defined last time. 69:02 We imported some data into Neo4j 69:07 based on the data that we downloaded 69:12 for a specific county. 69:13 We converted that from shape file 69:16 that we downloaded into GeoJSON, 69:19 and then we use APOC load JSON 69:22 to import that data into Neo4j. 69:26 Then, in our GraphQL API, 69:29 we use the infer schema 69:31 functionality to generate GraphQL type definitions 69:35 from that data in Neo4j, 69:39 which gave us the ability to query that using GraphQL, 69:42 including doing some of these filtering and ordering 69:46 that will be very useful for us 69:48 for our property search functionality. 69:51 And then we used the cipher schema directive functionality 69:57 to define a computed fields in our GraphQL schema 70:02 using cypher specifically 70:04 to calculate estimated sales price for our properties. 70:09 Cool, so that I think is pretty good. 70:12 We'll go ahead and stop there. 70:15 I'll be sure to push all of this up to GitHub. 70:19 The link 70:21 is in the chat but it's this 70:25 Willow GRANDstack repo, 70:27 which again, I had just created last time, 70:30 just from the GRANDstack starter. 70:33 Cool, so next time, I think we'll continue on 70:37 our data import journey. 70:40 Next time working with some of the geospatial data 70:44 that we had, we had latitude and longitude 70:47 for the properties, we had the bounds right 70:51 the polygons to work with. 70:53 So we'll see how we can include that in Neo4j 70:56 and then how we can work with that geospatial data 71:00 in our GraphQL API. 71:01 So we'll pick up with that next week. 71:05 So just a reminder, 71:06 every Thursday at 2pm Pacific, 71:09 we'll be working on building out this application 71:12 using GRANDstack. 71:15 Cool, so hope you enjoyed that. 71:17 In the meantime, feel free to to reach out 71:20 and ping me on Twitter or on the Neo4j user slack. 71:25 Which you can join it Neo4j.com/slack. 71:29 Cool, thanks a lot and I'll see you next time.
Subscribe To Will's Newsletter
Want to know when the next blog post or video is published? Subscribe now!