Extending The Reach Of Graphs
Keynote Session From Neo4j Connections: Graph Architecture And Integrations
In this keynote talk for the Neo4j Connections: Graph Architecture and Integrations event David Allen and William Lyon from Neo4j cover the Neo4j Connectors and integrations that enable using graphs and Neo4j with technologies such as Apache Spark, GraphQL, Apache Kafka, and business intelligence tools that use SQL and JDBC.
Links And Resources#
- Slides
- Neo4j Connectors
- Neo4j Labs
- All videos from Neo4j Connections: Graph Architecture And Integrations
- So thank you again for being with us here today. Reminder, please do use that #Neo4j if you wanna share, I'm really excited to take us into our first session. David Allen is a Partner Solution Architect at Neo4j and joined the company a few years ago out of the Neo4j developer community. David has extensive technology experience. Having worked at organizations, including the Department of Defense, Booz Allen Hamilton, and he's held roles like principal engineer, as well as Chief Technology Officer. David holds a bachelor's degree in computer science and a master's in information systems from Virginia Commonwealth University. Will Lyon is a Developer Relations engineer on the team here at Neo4j who recently marked his fifth anniversary with the company. So congratulations Will, he's also the author of Wiley's recently published book called, "Fullstack GraphQL Applications "with the GRANDstack." Will has held numerous positions as a software engineer prior to joining Neo4J, and he has a BA in economics and a master's in computer science from the university of Montana. Let's dive into our program, welcome David and Will. - Thanks Lance, for the introduction. I'm really glad to be here today to talk about how Neo4J is making investments all over the place to help you extend the reach of grass and talk a lot about all of the different projects that we've got going on and the ways that we're gonna make graph computing, more powerful over time. My name is David Allen. I'm a Technology Partner Architect at Neo4J and I work in the partnering group with a lot of third-party software companies helping to deliver some of these integrations. Will, would you like to introduce yourself? - Yeah, hey folks, my name is Will Lyon. I work on the Developer Relations team at Neo4j and specifically I spend most of my time working on what we call Neo4j labs projects, which are more experimental integrations and extensions for Neo4j. - Okay, so let's get started. If you've been in the graph ecosystem for a while, you probably know just how unique graphs are. They bring this very powerful and distinctive way of looking at data that's relationships first, unlike a lot of different systems that are really focused on the individual atoms of data, or the individual rows and records in a table, graphs are really focused on the relationships and it helps tie the data together and give it a whole lot of context. That makes graphs unique and so we frequently come across users where, when they first get the concept and they first understand where graphs fit into their architecture, it's a bit like magic. And you sort of see the Sparkle go off in their eyes as they picture all of the different things that they can do with this technology. But the downside of graphs is that they're also very different than the rest of the ecosystem that's out there. And so when we think about how we build systems today, typically it's in one of two models. On the left we might have what we would call the solar system model, where you take one particular technology like a graph database or a messaging system or anything like that. And you place that at the center of your architecture and all the other pieces revolve around it, just like we see in the solar system, everything revolves around that central sun, that central point of gravity. Now, increasingly as people are building modern applications, they're doing it more like a Lego block type of a setup where they have a lot of different components each with its own individual purpose and function that fit together just so, and so when we think about how we build these systems, we might talk about the solar system model versus the Lego block model. And over the last couple of years at Neo4J we've been thinking a lot about how graphs fit into that picture and how graphs fit into modern systems, design and development. Now, the inescapable reality of what's going on out there is that with cloud platforms and all of these different components, what we're seeing is more and more Lego blocks in the bin, and it's not just more Lego blocks, whether the Lego block might be a database, a queuing system, a machine learning solution or any number of other components, but it's also that we're building more kinds of systems and more complex systems out of those various Lego blocks. And so when you go to a cloud provider today, for example, you might be able to take any number of different components and managed services off of the shelf. And then from those components build any number of different systems, whether it's the little car or the piece of art on the right or the complex building on the bottom here, they're vastly different in their purpose and their design, but they're all made of the same technical building blocks underneath of the covers. Now in the graph world, we have this 900 pound gorilla that we need to talk about, and that is tables. Tables are very, very dominant in the data world. And so when we think about how unique graphs are and the value proposition that they bring, we also have to address how different they are from tables. And so if we wanna fit into this ecosystem of applications that are being built around graphs, we have to have a way of dealing with the table world and getting data easily and simply back and forth, to and from the table world, since honestly, that's where a lot of people live and you can't get full value out of graphs without being able to move in around and through tables since so much of the enterprise's data is already in that format and structure. And so really when we think about Neo4j connections, it's really about graph outreach. So we know that graphs are powerful technology that has applicability in a lot of different areas, but in order to see that impact, it has to work well just about everything else. And so the key message today and one of the key messages of this entire event is how Neo4j is investing to expand the reach and impact of graphs. We don't wanna be thinking about graphs as sort of like the sun at the center of the solar system, where everything else must revolve around the needs and specifications of the graph. But rather, we wanna make graphs into an easily reusable Lego block that you can then use to add on to any kind of an existing system. That's really what we mean by extending the reach of graphs. So that's a pretty idea, but we need to talk a little bit more about how we're actually going to do that and how we're going to extend and expand the reach of those graphs. Well a while ago, Neo4j introduced one of its first connectors, which was the Neo4j Connector for Apache Spark, also called Neo4j-Streams. And to give you an example of how this extends the reach of graphs. In this particular picture, what we've got is streams on the bottom left and tables on the bottom right. In the Apache Kafka world. They already provide this fluent interface for moving back and forth from streams to tables. And what we did with that connector Neo4j Connector for Apache Spark was to add graphs to the mix. And so, whereas before Kafka had a stream table duality, we kind of added graphs and created a trinity so that you can move back and forth between all of these different things. If you have a stream, you can easily turn it into a graph. If you have a graph, you can turn it into a table and so on and so forth. So how does that really apply? Like where would that become really valuable? Well, in this particular scenario, we have an application that was built for an insurance company and they had a fraud analytics need. So they had a number of different streams of information that were already on Kafka. And we use the Neo4j Connector for Apache Kafka to take those streams of information about customer interactions and fraud suspects, and confirm fraud signals. And then we sent those through the Kafka message bus to Neo4j where we wrote them into a set of graph patterns. Now because cipher and graph analytics make it so easy to identify potential fraud. The customer was able to develop this flagger application that would flag potential fraud inside of the graph directly. And then onto that, we added a GRANDstack application where an analyst could actually investigate and adjudicate cases of suspected fraud, and then send that information back to Neo4j, once we actually have a confirmed case of fraud, because the Neo4j Connector for Apache Kafka is bi-directional, it's very easy to share that information with the rest of the enterprise, whether it's running on Kafka or Oracle or any other technology. And so previous to this particular connector, you might have had to develop a lot of custom code to make this integration possible, or you might've had all of those graph insights locked up just within the Neo4j graph Island within your enterprise. But by having this connector, we have the ability to share that out and to make those insights actionable for the rest of the enterprise. So we didn't stop with the Neo4j Connector for Apache Spark. The next one that we did after that was the Neo4j Connector for business intelligence or BI. And so one of the common patterns that we saw in our customer base was in the dotted box at the top here, you had a customer who had a fraud analytics app, for example, and they were happily using Neo4j and getting a lot of value out of that. But in another part of the enterprise, there was a business intelligence platform at use that was being used to pull data from a lot of different systems and then fuse it into dashboards. And so a lot of companies use something like a business intelligence suite for providing insights to decision makers. The decision maker sees a set of simple metrics that is driven by operational data at a lower level in the company. Now, those business intelligence platforms fuse data from a lot of different systems. Here I've added Oracle but in reality, when you go do this with a customer, it ends up being more complicated and you have multiple different sources. So the Neo4j Connector for BI, what it basically does is it allows you to query Neo4j with SQL directly. And the power of that is that you can take existing tools like Tableau and like Sisense who we're going to be discussing today. And you can pour that data into a business intelligence platform and use it to drive dashboards and key decision points for decision makers within your enterprise. So, one of the ways, one of the fun ways that we took this particular connector and applied it was in a retail promotions case. So a lot of retailers are going to have information within their enterprise about sales of items over months. And what we did is we took all of that data, put it into Neo4j, and then we use advanced graph algorithms to cluster that data into communities. And so by using some of these graph techniques, we can look at communities of purchasers and see how their behavior segments, so that we can get to a kind of insight like households that buy this item also tend to buy this item, that in turn can help drive, for example, a retail recommendation engine. So in this particular promotions case, what we did is we took the profitable communities. Those that produced the highest volumes of revenue and margin for this particular retailer. And then we identified a set of items that were sold by the retail that were low volume, but that were also bought by the profitable communities. And this kind of an insight allows you to target a promotional program to that particular segment of your customers. You can look at low volume items that you would like to boost. We know that profitable communities are already buying them, and this then implies what kind of promotion to build that would place those items in front of the right segment of customers to drive greater sales and profitability. All of that is basically possible because the graph analytics that we were doing within Neo4j flows seamlessly through to BI suites, in this particular case, this is a screenshot of Tableau doing this analysis. One of the things we're here to talk to you about today is the launch of the Neo4j Connector for Apache Spark. And this is a similar story, but instead of business intelligence and Apache Kafka, this brings the power of graphs to the Spark world. So in this overall diagram, what it shows is this bi-directional flow of data. On the left you have Spark, which is really powerful for assembling and transforming large datasets. The connector itself allows you to reshape all of those tables into graphs, put that into Neo4j where it can be enriched and where you can add a lot of value to the graph using the existing analytics that Neo4j provides, as well as the graph data science library, you process those graphs. You can easily transform that data back into tables, and then you can push it to any other system in your enterprise via Spark. Doesn't matter whether we're dealing with CSV files on Amazon S3, or whether we're dealing with call center data in a bespoke system, or maybe it's Oracle. In all of those cases, typically Spark has the plugins that are necessary to orchestrate that overall data flow. Now over the last couple of months, one of the really fun things that I've been doing with this is to use the Spark analysis for trend analysis in global news. And so I personally have been playing with the Spark connector together with Databricks, which is a hosted Spark environment and Neo4j Aura, and by connecting those two together. And so there's this wonderful data set out there that's called GDELT. It's done in cooperation with Google, but effectively what they do is they publish all of the different URLs that are scraped along with annotations about those URLs, like what persons, locations and organizations that the URL is talking about. And so if you take this data and transform it into a graph, what you're really looking at is a global view of all of the different events that are happening worldwide. And you can use that graph as the basis for some really, really cool use cases around predictive analytics, looking for potential signs of manipulation behavior in the news and a number of other use cases as well. And so where I've personally been starting off with the Neo4j Connector for Apache Spark is transforming this complex GDELT dataset, putting it into graphs, and then applying some of our graph data science processes to the GDELT graph to give global graph insights about what's happening on the planet in real time. Will. - Great. So the next integration I'd like to talk about is the Neo4j GraphQL integrations. If you're familiar with GraphQL, you know, that it's really been a paradigm shift for how we work with data in web and mobile applications. So GraphQL brings this strict type system, a graph data model, and this idea of arbitrary traversals through a data graph. It brings those ideas to the world of APIs. Now the Neo4j GraphQL integrations enable developers to build GraphQL APIs backed by Neo4j, but also with the GRANDstack framework. Developers can now build these data intensive web and mobile applications using graphs all the way down the stack, both front-end and the backend. Now, this has been an area that I've been particularly interested in myself. I've been working on writing a book published by Manning called, "Fullstack GraphQL." The goal of this book has really been to show how you can use these technologies, Neo4j, GraphQL, React and Apollo, how you can use these together to build applications that take advantage of graphs all the way through the stack and the benefits that you get when you do that. So let's take a look at an example of the type of application that we would build using Neo4j and GraphQL. So, one thing that's really nice about GraphQL is that it brings the power of graphs in Neo4j all the way through to front-end developers. And it also serves as a great integration point for bringing in data from different systems and different APIs into our application. So this example here is a travel guide application that we built. This is actually when we built on the Neo4j livestream. So you can see all of the code and actually recordings of all the videos of us building this if you're interested. But the idea here is we took some data for New York city from OpenStreetMap, loaded that into Neo4j, we then built a GraphQL API on top of that, that gave us the ability to find efficient real-time routes between points of interest within New York city. So, one thing that graph databases like Neo4j are really good at is using graph algorithms to do very efficient real-time routing, you see this a lot in logistics and those sorts of use cases, but not only are we pulling in the power of routing with graph algorithms in this example, we're also pulling in data from Wikipedia. We're also pulling in Crowdsourced images through the Mapillary API all through the Neo4j GraphQL integration. So it's a really powerful tool for building these type of data intensive applications. So we've talked a bit about some of the different connectors and integrations. We've given some examples of what you can build with those integrations and Neo4j. Let's talk a little bit about how we work on these in Neo4j and how we deliver these to Neo4j customers. And users. So first of all, you'll notice that these connectors have a consistent naming Neo4j Connectors for Apache Spark, Neo4j Connector for Apache Kafka. So that consistent naming allows you to know that all these connectors are supported for Neo4j enterprise customers. So under the same terms as your existing subscription, there is no increase in price for these, that's included in the Neo4j enterprise license. And the goal for these connectors is to target the popular platforms and technologies that really open up some of these graph cases throughout your architecture. So things like messaging via Kafka, transforming data through Spark, and then really the entire SQL universe with the BI Connector. Now you may see other projects that are integrations for Neo4j or the other technologies that carry the Neo4j Labs label. So while connectors are official supported integrations, there's another group of Neo4j working on more experimental projects for Neo4j, integrating those with other more upcoming technologies and really addressing emerging developer trends. And that's the Neo4j Labs group, which is where I spent most of my time. There's also this idea of incubating and validating some of these projects. So some of these projects that David was talking about, the Neo4j Connectors actually started as Neo4j Labs projects. But again, the idea is to have consistent labeling of these projects as you know, Neo4j Labs so that you know, that while these may not be commercially supported, there is a team at Neo4j working on validating these integrations and extensions. - So Neo4j invests in these things in those two categories, Neo4j Supported Connectors and Neo4j Labs. And at the beginning of this presentation, we talked a little bit about why it's so important to extend the reach of graphs. We gave some use case examples. And then we talked about how we invest in this, but concretely what's new today? Well, there's two things that are new today. The first thing is we're announcing a partnership with Sisense, Sisense is a really powerful data and analytics platform. What it really helps with is data engineers and developers to build applications that are really highly interactive and tailored to user experiences. Essentially, what Sisense allows is that with Neo4j you can keep your data in the graph and you can query it with SQL just as though it were tables. In turn, all of the functionality of Sisense now applies to the data that's in your graph. This particular partnership allows people to analyze patterns in the data, through the relationships. And that's really the theme of graphs from the very beginning, we talked about how graphs really focus on those relationships. And this powers a lot of really powerful use cases like recommendation engines, pricing, and promotion, fraud detection, and many others. And Sisense happens to be really good for cloud developers as well. Now, this partnership is particularly exciting for me because it's an example of how some of our connectors too, can broaden their reach and impact. You're going to be hearing a lot more about Sisense in a later session today. And I would really strongly recommend that you go to that session and check out all of the details. They're gonna show you some really cool stuff that's possible in part, because of the BI connector that was there and Neo4j and Sisense working together. We're really enthusiastic about partnerships like this, because again, it's all about extending the reach of graphs into other platforms. And now all of Sisense's customers can benefit from the same insights as the existing Neo4j customers have had for some time. The second major piece of news today is the general availability of the Neo4j Connector for Apache Spark. We've taken a whole lot of research and work and put it into an easily packaged connector that works great with a number of different Spark environments. So it supports both Neo4j 3.5 and 4.0. It will work with single instance or cluster, whichever you're running, and it's focused on a newer API within the Spark world called the data source API. And that makes it just very convenient to use. Within the Spark ecosystem, they already have so many different sources that they can read and write data to and from. In Spark, those are referred to as data frames and in the Neo4j Connector for Apache Spark, we simply extended that to include graphs. Now, this is another one where we're going to have another detailed session where myself and a colleague Andrea Santurbano are gonna be talking about this in a separate session. Definitely have a look at that too, because it's a powerful new capability for anybody who's using Neo4j, the bottom line here is that you can read anything from Neo4j and the Spark, write anything back to Neo4j and then easily transform to, and from graphs and tables. And this is really one of the themes here of how we're going to broaden the impact of graphs. At the bottom of the slide is a link where you can find out a whole lot more. And as of right now, you can go there, read the documentation, download the package directly, and we would love it if you would get in touch with us and provide some feedback about what your experiences with it was. - Great. So what is now, where do we go from here? I guess, what does this really mean for us as Neo4j users and customers? So, first of all, there's this idea of more value that we'll receive as Neo4j users and customers. So as more and more of these connectors and integrations come online, it will enable us to bring graphs to different parts of our architecture, this idea of portable graph insights. This idea, we can share these values that we get from graphs and Neo4j throughout the enterprise. You can think of the concept of polyglot persistence. This will make it easier for us to bring graphs in Neo4j into an existing workload in our architecture. And really this is just the beginning, Neo4j is continuing to invest in these integrations and extensions. And we're also really excited to see what our third-party partners are building, what new use cases these connectors enable for them as well. So I mentioned earlier this idea of Neo4j Labs as a place that we incubate and validate extensions with Neo4j and other technologies, and also address new emerging developer trends. So I wanna give maybe a little sneak peek at some of the projects that we have within labs that we're working on now. So these are currently available now within Neo4j Labs, you can download and play around with any of these. And one trend that I think is really interesting is this idea of low code development. This is maybe become a bit of a buzz word at this point, but I think low code really is just idea that developers want to solve problems and low code tools enable them to solve problems by writing less code than they would otherwise. So we address this low code development trends within Neo4j Labs in a couple of areas. Two that I'll highlight, one is what we call a graph app, which is really a plugin or extension for Neo4j desktop. So a graph app called GraphQL Architect and GraphQL Architect allows us to build test and deploy GraphQL APIs backed by Neo4j really without writing any code at all. That's a great tool for building and developing GraphQL APIs. The next low-code tool is the Graph Algorithms Playground or NEuler. And this graph app allows you to explore the graph data science algorithms. So being able to construct the code necessary to run these graph algorithms in Neo4j, but also to visualize the results as well. Another trend that we're addressing within Neo4j Labs is this idea of cloud native architectures. And one of the labs projects that addresses this is the Neo4j-Helm Chart, which allows you to deploy Neo4j clusters on Kubernetes. And the final trend that I'll point out is this idea of knowledge graphs. There's a Neo4j Labs project called Neosemantics that allows you to work with RDF data in Neo4j for building out your knowledge graph. - Great. So we've talked about these connectors and integrations at a high overview level. Now later on throughout the day, we'll take a deep dive into each one of these connectors in more detail. So we'll hear from Chuck Frisbie from Sisense, who will show us how to use the Neo4j Connector for BI, with Sisense for analytics, we'll hear from Dave Fauth the Neo4j field team, he'll show us some best practices and performance optimizations when using the Neo4j Connector for Apache Kafka, then later on, David will be joined by Andrea from LARUS, who'll take a look at using the Neo4j Connector for Apache Spark. We'll also hear from Will Reynolds from Hoare Lea, who will show us how to use Neo4j and GraphQL in the real world. And then finally, we will hear from Julia Neagu and Anthony Deighton from Tamr, who'll be joined with Nav from the Neo4j solutions team, and they'll be taking a look at cloud native data mastering. So I think, these deep dive sessions look really interesting and should give us some insights into new ways that we can use Neo4j and different ways that we can integrate Neo4j into our architecture. So really looking forward to that, well, thanks so much for joining this session with David and myself, and hopefully we'll see you in some of the sessions later on today. - Thanks. - [Mary] Hey Will, Hey David, it's Mary Lee. Thank you so much for such a great presentation to lay the groundwork for what's gonna come for the rest of the day. And I know that you touched on some of the sessions that are happening, but I have a question for both of you. So which talks are you particularly looking forward to? - For me, I'm really interested to see the Sisense talk. Kind of like what I mentioned a bit earlier, when we develop some of these connectors, it's particularly gratifying to see other companies seize on them and then build really cool things on top of them. And I think that the Sisense integration really qualifies for that. So that's the one that I'm most looking forward to see. - For me, I'm really, really excited for Will Reynolds talk on using a GRANDstack in the real world. I got a sneak peek of this talk last week, and it's really, really neat. So Will works for a company called Hoare Lea in the UK, that sort of an architecture and design company. And they use Neo4j and GraphQL to pull in data from lots of different systems that are keeping track of how they're designing systems and architecture, how the systems interact. And they've built a lot of really, really interesting and compelling visual tools as well. So that should be a really exciting one. - It's a little hard to pick favorites though, because the Dave Fauth's talk on Kafka is going to be really good and have a lot of technical insight. And obviously the Spark talk, that's something that's been near and dear to my heart for the last couple of months. So I can't leave that one out either, but basically we tried to put together an agenda that has a little bit of something for everybody, and a lot of applicability across the space. - [Mary] Well, it definitely sounds like a super exciting day for sure. We have had a couple of questions that have come in from the audience. So I wanna make sure and address those. And I think that Will, I think you touched on this earlier, but we can maybe dive a little deeper into it. And that's, what's the relationship between the Neo4j Connectors that are supported and then what's going on in the Neo4j Labs? - Yeah, that's a great question. So the idea of the Neo4j Connectors is that these are supported integrations for Neo4j customers with an enterprise license. So this means that they have all of the support that they can expect with any other aspect of Neo4j products. When it comes to Neo4j Labs. The idea here is to build more experimental integrations with Neo4j and other technologies more as a validation phase. So we wanna make sure that we're addressing not only just bringing graphs in Neo4j to other technologies, but also addressing new and emerging developer trends that we're seeing. So when projects are within labs, well, you know, they're not officially supported. We support them on a best effort case, but really the goal there is validating and building out these integrations and extensions, having said that there are many Neo4j users and customers that feel that these labs projects provide enough value, that they do want to use those in production. And we have seen a few of those. So that's always exciting to see as well. - [Mary] Great. And I think a follow on to this question, and again I think you touched on it, but if someone wanted to learn more about the connectors and labs, where would they go? - The supported connectors have their own landing spot on our website. And actually I, since I'm sharing my screen, I can show you what that looks like. Within the Neo4j developer pages. We have this particular site here where you just simply go to the Neo4j developer guides and you look for Neo4j tools and integrations, and the supported connectors are listed right here. And these links go straight to pages where you can download and see the documentation for each of those connectors. In the case of the Connector for Apache Spark, we can click on that link and then go straight into the documentation and see the quick start process. So all of the supported connectors can be found right here in the Neo4j developer pages. Labs on the other hand has its own separate page. It's neo4j.com/labs and where you can find out a lot more about labs and see current projects and get frequently asked questions about it. One of the things that's great about the labs page is that each of the different labs projects has its own miniature page, that where you can click on it and learn a lot more details about just that particular labs investment. So we've tried to make this as simple as possible. If you ever run into any questions or comments, you can always either reach out to your Neo4j support person, or you can go to community.neo4j.com, which is our community site for graphs and get feedback and input from the entire rest of the community. - [Mary] Great. Thank you so much. And then it's funny that you're on the GRANDstack page, because we also had a question I'm asking to tell us again where we can find that GRANDstack book. - Yeah, there's a free download version of the book available @grandstack.io/ebook, and also grandstack.io is another great site where you can find not just the documentation for the Neo4j GraphQL integration, but you can also find videos and blog posts from Neo4j users that are sharing things that they've built, that you can also find the Neo4j live stream where, every Thursday we go through and start building new applications from scratch. Like we've done a real estate search application that travel guide that I mentioned. And I think we're gonna get ready to start on building a podcasting app from scratch, using GraphQL in Neo4j, so lots of really good resources out there. - [Mary] Great. Thank you so much. And I just wanna let the audience know we are running a little short on time. In fact, I think we've gone over a little bit. So if your question wasn't answered, you can also shoot an email over to our webinar @neo4j.com. And we'll be sure to get your question routed to the appropriate person. I wanna thank you again, Will and David for your time and kicking off our connections today. And for those of you in the audience, stay tuned, our next session will be starting shortly. Thanks so much.
Subscribe To Will's Newsletter
Want to know when the next blog post or video is published? Subscribe now!