Graph Algorithms for Data Science: With examples in Neo4j

Автор: Bratanic Tomaž

Дата выхода: 2024

Издательство: Manning Publications Co.

Количество страниц: 353

Размер файла: 4,7 МБ

Тип файла: PDF

Добавил: codelibs

Проверить на вирусы

Graph Algorithms for Data Science....1

brief contents....5

contents....7

foreword....13

preface....15

acknowledgments....17

about this book....18

Who should read this book....18

How this book is organized....18

About the code....20

liveBook discussion forum....20

about the author....21

about the cover illustration....22

Part 1—Introduction to graphs....23

1 Graphs and network science: An introduction....25

1.1 Understanding data through relationships....30

1.2 How to spot a graph-shaped problem....33

1.2.1 Self-referencing relationships....34

1.2.2 Pathfinding networks....35

1.2.3 Bipartite graphs....36

1.2.4 Complex networks....37

Summary....38

2 Representing network structure: Designing your first graph model....40

2.1 Graph terminology....43

2.1.1 Directed vs. undirected graph....43

2.1.2 Weighted vs. unweighted graphs....44

2.1.3 Bipartite vs. monopartite graphs....45

2.1.4 Multigraph vs. simple graph....46

2.1.5 A complete graph....47

2.2 Network representations....47

2.2.1 Labeled-property graph model....48

2.3 Designing your first labeled-property graph model....51

2.3.1 Follower network....52

2.3.2 User–tweet network....54

2.3.3 Retweet network....58

2.3.4 Representing graph schema....61

2.4 Extracting knowledge from text....63

2.4.1 Links....64

2.4.2 Hashtags....66

2.4.3 Mentions....70

2.4.4 Final Twitter social network schema....72

Summary....74

Part 2—Network analysis....75

3 Your first steps with Cypher query language....77

3.1 Cypher query language clauses....79

3.1.1 CREATE clause....79

3.1.2 MATCH clause....82

3.1.3 WITH clause....86

3.1.4 SET clause....87

3.1.5 REMOVE clause....89

3.1.6 DELETE clause....89

3.1.7 MERGE clause....92

3.2 Importing CSV files with Cypher....95

3.2.1 Clean up the database....95

3.2.2 Twitter graph model....96

3.2.3 Unique constraints....97

3.2.4 LOAD CSV clause....98

3.2.5 Importing the Twitter social network....99

3.3 Solutions to exercises....105

Summary....108

4 Exploratory graph analysis....109

4.1 Exploring the Twitter network....110

4.2 Aggregating data with Cypher query language....112

4.2.1 Time aggregations....117

4.3 Filtering graph patterns....119

4.4 Counting subqueries....123

4.5 Multiple aggregations in sequence....124

4.6 Solutions to exercises....127

Summary....129

5 Introduction to social network analysis....131

5.1 Follower network....134

5.1.1 Node degree distribution....137

5.2 Introduction to the Neo4j Graph Data Science library....142

5.2.1 Graph catalog and native projection....143

5.3 Network characterization....144

5.3.1 Weakly connected component algorithm....145

5.3.2 Strongly connected components algorithm....149

5.3.3 Local clustering coefficient....152

5.4 Identifying central nodes....156

5.4.1 PageRank algorithm....156

5.4.2 Personalized PageRank algorithm....160

5.4.3 Dropping the named graph....162

5.5 Solutions to exercises....163

Summary....164

6 Projecting monopartite networks....166

6.1 Translating an indirect multihop path into a direct relationship....171

6.1.1 Cypher projection....172

6.2 Retweet network characterization....174

6.2.1 Degree centrality....174

6.2.2 Weakly connected components....178

6.3 Identifying the most influential content creators....181

6.3.1 Excluding self-loops....181

6.3.2 Weighted PageRank variant....181

6.3.3 Dropping the projected in-memory graph....183

6.4 Solutions to exercises....184

Summary....185

7 Inferring co-occurrence networks based on bipartite networks....187

7.1 Extracting hashtags from tweets....195

7.2 Constructing the co-occurrence network....199

7.2.1 Jaccard similarity coefficient....201

7.2.2 Node similarity algorithm....202

7.3 Characterization of the co-occurrence network....210

7.3.1 Node degree centrality....210

7.3.2 Weakly connected components....211

7.4 Community detection with the label propagation algorithm....212

7.5 Identifying community representatives with PageRank....215

7.5.1 Dropping the projected in-memory graphs....217

7.6 Solutions to exercises....217

Summary....218

8 Constructing a nearest neighbor similarity network....220

8.1 Feature extraction....224

8.1.1 Motifs and graphlets....226

8.1.2 Betweenness centrality....229

8.1.3 Closeness centrality....230

8.2 Constructing the nearest neighbor graph....232

8.2.1 Evaluating features....232

8.2.2 Inferring the similarity network....234

8.3 User segmentation with the community detection algorithm....235

8.4 Solutions to exercises....237

Summary....239

Part 3—Graph machine learning....241

9 Node embeddings and classification....243

9.1 Node embedding models....246

9.1.1 Homophily vs. structural roles approach....247

9.1.2 Inductive vs. transductive embedding models....249

9.2 Node classification task....249

9.2.1 Defining a connection to a Neo4j database....251

9.2.2 Importing a Twitch dataset....252

9.3 The node2vec algorithm....254

9.3.1 The word2vec algorithm....254

9.3.2 Random walks....257

9.3.3 Calculate node2vec embeddings....259

9.3.4 Evaluating node embeddings....260

9.3.5 Training a classification model....264

9.3.6 Evaluating predictions....265

9.4 Solutions to exercises....267

Summary....267

10 Link prediction....269

10.1 Link prediction workflow....271

10.2 Dataset split....275

10.2.1 Time-based split....276

10.2.2 Random split....277

10.2.3 Negative samples....280

10.3 Network feature engineering....281

10.3.1 Network distance....282

10.3.2 Preferential attachment....284

10.3.3 Common neighbors....286

10.3.4 Adamic–Adar index....286

10.3.5 Clustering coefficient of common neighbors....288

10.4 Link prediction classification model....289

10.4.1 Missing values....291

10.4.2 Training the model....291

10.4.3 Evaluating the model....292

10.5 Solutions to exercises....293

Summary....294

11 Knowledge graph completion....296

11.1 Knowledge graph embedding model....301

11.1.1 Triple....301

11.1.2 TransE....302

11.1.3 TransE limitations....303

11.2 Knowledge graph completion....305

11.2.1 Hetionet....306

11.2.2 Dataset split....309

11.2.3 Train a PairRE model....309

11.2.4 Drug application predictions....310

11.2.5 Explaining predictions....311

11.3 Solutions to exercises....313

Summary....314

12 Constructing a graph using natural language processing techniques....315

12.1 Coreference resolution....319

12.2 Named entity recognition....319

12.2.1 Entity linking....320

12.3 Relation extraction....321

12.4 Implementation of information extraction pipeline....322

12.4.1 SpaCy....323

12.4.2 Corefence resolution....323

12.4.3 End-to-end relation extraction....325

12.4.4 Entity linking....327

12.4.5 External data enrichment....330

12.5 Solutions to exercises....330

Summary....331

appendix—The Neo4j environment....332

A.1 Cypher query language....333

A.2 Neo4j installation....333

A.2.1 Neo4j Desktop installation....333

A.2.2 Neo4j Docker installation....338

A.2.3 Neo4j Aura....338

A.3 Neo4j Browser configuration....338

references....340

index....345

Symbols....345

Numerics....345

A....345

B....345

C....345

D....346

E....346

F....347

G....347

H....348

I....348

J....348

K....348

L....348

M....349

N....349

O....350

P....350

Q....350

R....350

S....351

T....351

U....352

V....352

W....352

Graphs are the natural way to represent and understand connected data. This book explores the most important algorithms and techniques for graphs in data science, with concrete advice on implementation and deployment. You don’t need any graph experience to start benefiting from this insightful guide. These powerful graph algorithms are explained in clear, jargon-free text and illustrations that makes them easy to apply to your own projects.

In Graph Algorithms for Data Science you will learn:

Labeled-property graph modeling
Constructing a graph from structured data such as CSV or SQL
NLP techniques to construct a graph from unstructured data
Cypher query language syntax to manipulate data and extract insights
Social network analysis algorithms like PageRank and community detection
How to translate graph structure to a ML model input with node embedding models
Using graph features in node classification and link prediction workflows

Graph Algorithms for Data Science is a hands-on guide to working with graph-based data in applications like machine learning, fraud detection, and business data analysis. It’s filled with fascinating and fun projects, demonstrating the ins-and-outs of graphs. You’ll gain practical skills by analyzing Twitter, building graphs with NLP techniques, and much more.Foreword by Michael Hunger.

About the technology

A graph, put simply, is a network of connected data. Graphs are an efficient way to identify and explore the significant relationships naturally occurring within a dataset. This book presents the most important algorithms for graph data science with examples from machine learning, business applications, natural language processing, and more.

About the book

Graph Algorithms for Data Science shows you how to construct and analyze graphs from structured and unstructured data. In it, you’ll learn to apply graph algorithms like PageRank, community detection/clustering, and knowledge graph models by putting each new algorithm to work in a hands-on data project. This cutting-edge book also demonstrates how you can create graphs that optimize input for AI models using node embedding.

What's inside

Creating knowledge graphs
Node classification and link prediction workflows
NLP techniques for graph construction

About the reader

For data scientists who know machine learning basics. Examples use the Cypher query language, which is explained in the book.

Если вам понравилась эта страница - поделитесь ею с друзьями, тем самым вы помогаете нам развиваться и добавлять всё больше интересных и нужным вам книг