Main Algorithms : https://neo4j.com/docs/graph-algorithms/current/introduction/
Example of notebook : https://nicolewhite.github.io/neo4j-jupyter/hello-world.html
Getting started with Neo4j and Python : https://marcobonzanini.com/2015/04/06/getting-started-with-neo4j-and-python/
Basic in Cypher : https://neo4j.com/developer/cypher-query-language/
Cypher case study Tour de France : https://neo4j.com/graphgist/modeling-the-tour-de-france-2014-in-a-neo4j-graph-database
Algorithms [Degree Centrality / Eigenvector Centrality / Katz Centrality / PageRank / HITS Hubs and Authorities / Closeness Centrality / Betweenness Centrality] : https://aksakalli.github.io/2017/07/17/network-centrality-measures-and-their-visualization.html
from py2neo import Graph
graph = Graph("bolt://localhost:7687", user="neo4j", password="france")
graph.delete_all()
import pandas as pd
data_rider = pd.read_csv('https://raw.githubusercontent.com/inserpio/tour-de-france-2014/master/tour-de-france-2014-0001-teams-and-riders.csv')
data_rider.head(3)
list(data_rider)
Understand my queries :
Line 1 : load CSV file
Line 2 : r:Race
Line 3 : t:Team
Line 4 : p:Rider
Line 5 : Create link and bring information to links
query_rider = """
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/inserpio/tour-de-france-2014/master/tour-de-france-2014-0001-teams-and-riders.csv" AS csvLine
MERGE (r:Race { id: toInt(csvLine.RACE_ID), name: csvLine.RACE_NAME, from: csvLine.RACE_FROM, to: csvLine.RACE_TO, edition: csvLine.RACE_EDITION, distance: csvLine.RACE_DISTANCE, number_of_stages: csvLine.RACE_NUMBER_OF_STAGES, website: csvLine.RACE_WEBSITE })
MERGE (t:Team { id: toInt(csvLine.TEAM_ID), name: csvLine.TEAM_NAME, country: csvLine.TEAM_COUNTRY, sportingDirectors: csvLine.TEAM_MANAGERS })
MERGE (p:Rider { name: csvLine.RIDER_NAME, country: csvLine.RIDER_COUNTRY })
CREATE (t)-[:TAKES_PART_IN]->(r)<-[:TAKES_PART_IN { number: toInt(csvLine.RIDER_NUMBER), info: csvLine.RIDER_INFO }]-(p), (p)-[:RIDES_FOR { year: toInt(csvLine.RACE_YEAR) }]->(t);
"""
graph.run(query_rider)
from IPython.display import Image
Image(filename="pictures/Riders_links_info.gif")
#graph.delete_all()
import pandas as pd
data_etapes = pd.read_csv('https://raw.githubusercontent.com/inserpio/tour-de-france-2014/master/tour-de-france-2014-0002-stages.csv')
data_etapes.head(3)
list(data_etapes)
query_etapes = """
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/inserpio/tour-de-france-2014/master/tour-de-france-2014-0002-stages.csv" AS csvLine MATCH (r:Race { id: 1 })
MERGE (s:Stage { name: csvLine.STAGE_START + " / " + csvLine.STAGE_FINISH, number: toInt(csvLine.STAGE_NUMBER), type: csvLine.STAGE_TYPE, date: csvLine.STAGE_DATE, distance: toFloat(csvLine.STAGE_DISTANCE), info: csvLine.STAGE_INFO})
MERGE (cs: City { name: csvLine.STAGE_START, country: csvLine.STAGE_START_COUNTRY, lat: toFloat(csvLine.STAGE_START_LATITUDE), lon: toFloat(csvLine.STAGE_START_LONGITUDE) })
MERGE (cf: City { name: csvLine.STAGE_FINISH, country: csvLine.STAGE_FINISH_COUNTRY, lat: toFloat(csvLine.STAGE_FINISH_LATITUDE), lon: toFloat(csvLine.STAGE_FINISH_LONGITUDE) })
CREATE (s)-[:IS_A_STAGE_OF]->(r), (s)-[:STARTS_FROM]->(cs), (s)-[:FINISHED_AT]->(cf);
"""
graph.run(query_etapes)
from IPython.display import Image
Image(filename="pictures/Etapes_links.gif")
import pandas as pd
data_climbs = pd.read_csv('https://raw.githubusercontent.com/inserpio/tour-de-france-2014/master/tour-de-france-2014-0003-climbs.csv')
data_climbs.head(3)
list(data_climbs)
query_climbs = """
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/inserpio/tour-de-france-2014/master/tour-de-france-2014-0003-climbs.csv" AS csvLine
MATCH (s:Stage { number: toInt(csvLine.STAGE_NUMBER) })
CREATE (s)-[:INCLUDES]->(c:Climb { startingAtKm: toFloat(csvLine.STARTING_AT_KM), name: csvLine.NAME, initialAltitude: toFloat(csvLine.INITIAL_ALTITUDE), averageSlope: toFloat(csvLine.AVERAGE_SLOPE), distance: toFloat(csvLine.DISTANCE), category: csvLine.CATEGORY });
"""
graph.run(query_climbs)
Image(filename="pictures/climbs.PNG")
import pandas as pd
data_intermediate_sprints = pd.read_csv('https://raw.githubusercontent.com/inserpio/tour-de-france-2014/master/tour-de-france-2014-0004-intermediate_sprints.csv')
data_intermediate_sprints.head(3)
query_intermediate_sprints = """
LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/inserpio/tour-de-france-2014/master/tour-de-france-2014-0003-climbs.csv" AS csvLine
MATCH (s:Stage { number: toInt(csvLine.STAGE_NUMBER) })
CREATE (s)-[:INCLUDES]->(c:Climb { startingAtKm: toFloat(csvLine.STARTING_AT_KM), name: csvLine.NAME, initialAltitude: toFloat(csvLine.INITIAL_ALTITUDE), averageSlope: toFloat(csvLine.AVERAGE_SLOPE), distance: toFloat(csvLine.DISTANCE), category: csvLine.CATEGORY });
"""
graph.run(query_intermediate_sprints)
Image(filename="pictures/intermediate_sprints.PNG")
questions_teams_per_country = """
MATCH (t:Team) RETURN DISTINCT t.country, collect(t.name), count(t.name) AS teamsPerCountry ORDER BY teamsPerCountry DESC;
"""
response_teams_per_country = graph.run(questions_teams_per_country)
for d in response_teams_per_country:
print(d)
graph.run(questions_teams_per_country).to_data_frame()
questions_riders_per_country = """
MATCH (r:Rider) RETURN DISTINCT r.country, count(r.name) AS ridersPerCountry ORDER BY ridersPerCountry DESC;
"""
graph.run(questions_riders_per_country).to_data_frame().head()
Goal: Graph algorithms are used to compute metrics for graphs, nodes, or relationships. Provide insights on relevant
- Entities in the graph (centralities, ranking)
- Inherent structures like communities (community-detection, graph-partitioning, clustering).
How does it work? : Many graph algorithms are iterative approaches that frequently traverse the graph for the computation using random walks, breadth-first or depth-first searches, or pattern matching.
Hard?: Due to the exponential growth of possible paths with increasing distance, many of the approaches also have high algorithmic complexity.
Fortunately, optimized algorithms exist that utilize certain structures of the graph, memorize already explored parts, and parallelize operations.
https://neo4j.com/docs/graph-algorithms/current/introduction/
PageRank
ArticleRank
Betweenness Centrality
Closeness Centrality
Harmonic Centrality
Eigenvector Centrality
Degree Centrality
Louvain
Label Propagation
Connected Components
Strongly Connected Components
Triangle Counting / Clustering Coefficient
Balanced Triads
Minimum Weight Spanning Tree
Shortest Path
Single Source Shortest Path
All Pairs Shortest Path
A*
Yen’s K-shortest paths
Random Walk
Adamic Adar
Common Neighbors
Preferential Attachment
Resource Allocation
Same Community
Total Neighbors
my_network = [("Adrien", "Charles"), ("Charles", "Louis"), ("Charles", "Yann"), ("Yann", "Adrien"), ("Louis", "Yann")]
G_symmetric = nx.Graph()
G_asymmetric = nx.DiGraph()
for node in my_network:
# Symetric Graph
G_symmetric.add_edge(node[0],node[1])
# Asymetric Graph
G_asymmetric.add_edge(node[0],node[1])
nx.draw_networkx(G_symmetric)
nx.draw_networkx(G_asymmetric)
my_network = [("Adrien", "Charles",2), ("Charles", "Louis",18), ("Charles", "Yann",30), ("Yann", "Adrien",5), ("Louis", "Yann", 11)]
G_weighted = nx.Graph()
for node in my_network:
G_weighted.add_edge(node[0], node[1], weight=node[2])
nx.draw_networkx(G_weighted)
PATH_FACEBOOK = "dataset/facebook_combined.txt"
G_fb = nx.read_edgelist(PATH_FACEBOOK, create_using = nx.Graph(), nodetype=int)
print(nx.info(G_fb))
nx.draw_networkx(G_fb)