10 Machine Learning Blueprints You Should Know for Cybersecurity....2
Contributors....5
About the author....5
About the reviewers....5
Preface....15
Who this book is for....17
What this book covers....18
To get the most out of this book....19
Download the example code files....20
Conventions used....20
Get in touch....21
Share Your Thoughts....21
Download a free PDF copy of this book....21
Chapter 1: On Cybersecurity and Machine Learning....23
The basics of cybersecurity....23
Traditional principles of cybersecurity....23
Modern cybersecurity – a multi-faceted issue....25
Privacy....28
An overview of machine learning....28
Machine learning workflow....29
Supervised learning....31
Unsupervised learning....33
Semi-supervised learning....34
Evaluation metrics....35
Machine learning – cybersecurity versus other domains....37
Summary....39
Chapter 2: Detecting Suspicious Activity....41
Technical requirements....41
Basics of anomaly detection....42
What is anomaly detection?....42
Introducing the NSL-KDD dataset....43
Statistical algorithms for intrusion detection....46
Univariate outlier detection....46
Elliptic envelope....49
Local outlier factor....50
Machine learning algorithms for intrusion detection....55
Density-based scan (DBSCAN)....55
One-class SVM....59
Isolation forest....62
Autoencoders....65
Summary....74
Chapter 3: Malware Detection Using Transformers and BERT....75
Technical requirements....75
Basics of malware....76
What is malware?....76
Types of malware....77
Malware detection....78
Malware detection methods....78
Malware analysis....79
Transformers and attention....80
Understanding attention....80
Understanding transformers....83
Understanding BERT....85
Detecting malware with BERT....87
Malware as language....87
The relevance of BERT....88
Getting the data....88
Preprocessing the data....89
Building a classifier....90
Summary....96
Chapter 4: Detecting Fake Reviews....98
Technical requirements....98
Reviews and integrity....98
Why fake reviews exist....98
Evolution of fake reviews....99
Statistical analysis....101
Exploratory data analysis....101
Feature extraction....105
Statistical tests....106
Modeling fake reviews with regression....113
Ordinary Least Squares regression....113
OLS assumptions....114
Interpreting OLS regression....115
Implementing OLS regression....116
Summary....120
Chapter 5: Detecting Deepfakes....121
Technical requirements....121
All about deepfakes....121
A foray into GANs....122
How are deepfakes created?....124
The social impact of deepfakes....125
Detecting fake images....126
A naive model to detect fake images....127
Detecting deepfake videos....131
Building deepfake detectors....132
Summary....137
Chapter 6: Detecting Machine-Generated Text....139
Technical requirements....139
Text generation models....140
Understanding GPT....143
Naïve detection....145
Creating the dataset....145
Feature exploration....150
Using machine learning models for detecting text....152
Playing around with the model....154
Automatic feature extraction....155
Transformer methods for detecting automated text....159
Compare and contrast....162
Summary....162
Chapter 7: Attributing Authorship and How to Evade It....164
Technical requirements....164
Authorship attribution and obfuscation....164
What is authorship attribution?....165
What is authorship obfuscation?....166
Techniques for authorship attribution....167
Dataset....167
Feature extraction....169
Training the attributor....173
Improving authorship attribution....175
Techniques for authorship obfuscation....176
Improving obfuscation techniques....181
Summary....182
Chapter 8: Detecting Fake News with Graph Neural Networks....184
Technical requirements....184
An introduction to graphs....185
What is a graph?....185
Representing graphs....186
Graphs in the real world....188
Machine learning on graphs....189
Traditional graph learning....190
Graph embeddings....191
GNNs....193
Fake news detection with GNN....195
Modeling a GNN....195
The UPFD framework....195
Dataset and setup....196
Implementing GNN-based fake news detection....198
Playing around with the model....204
Summary....204
Chapter 9: Attacking Models with Adversarial Machine Learning....206
Technical requirements....206
Introduction to AML....207
The importance of ML....207
Adversarial attacks....207
Adversarial tactics....208
Attacking image models....210
FGSM....210
PGD....215
Attacking text models....217
Manipulating text....219
Further attacks....224
Developing robustness against adversarial attacks....225
Adversarial training....225
Defensive distillation....225
Gradient regularization....225
Input preprocessing....226
Ensemble methods....226
Certified defenses....226
Summary....227
Chapter 10: Protecting User Privacy with Differential Privacy....228
Technical requirements....228
The basics of privacy....229
Core elements of data privacy....229
Privacy and the GDPR....230
Privacy by design....232
Privacy and machine learning....233
Differential privacy....234
What is differential privacy?....234
Differential privacy – a real-world example....235
Benefits of differential privacy....236
Differentially private machine learning....238
IBM Diffprivlib....238
Credit card fraud detection with differential privacy....239
Differentially private deep learning....243
DP-SGD algorithm....243
Implementation....245
Differential privacy in practice....249
Summary....250
Chapter 11: Protecting User Privacy with Federated Machine Learning....252
Technical requirements....252
An introduction to federated machine learning....252
Privacy challenges in machine learning....253
How federated machine learning works....253
The benefits of federated learning....257
Challenges in federated learning....258
Implementing federated averaging....260
Importing libraries....260
Dataset setup....260
Client setup....261
Model implementation....262
Weight scaling....262
Global model initialization....263
Setting up the experiment....263
Putting it all together....264
Reviewing the privacy-utility trade-off in federated learning....267
Global model (no privacy)....268
Local model (full privacy)....269
Understanding the trade-off....270
Beyond the MNIST dataset....271
Summary....271
Chapter 12: Breaking into the Sec-ML Industry....273
Study guide for machine learning and cybersecurity....273
Machine learning theory....273
Hands-on machine learning....274
Cybersecurity....274
Interview questions....275
Theory-based questions....275
Experience-based questions....277
Conceptual questions....277
Additional project blueprints....278
Improved intrusion detection....279
Adversarial attacks on intrusion detection....280
Hate speech and toxicity detection....281
Detecting fake news and misinformation....282
Summary....283
Index....285
Why subscribe?....305
Other Books You May Enjoy....305
Packt is searching for authors like you....308
Share Your Thoughts....308
Download a free PDF copy of this book....308
Machine learning in security is harder than other domains because of the changing nature and abilities of adversaries, high stakes, and a lack of ground-truth data. This book will prepare machine learning practitioners to effectively handle tasks in the challenging yet exciting cybersecurity space.
The book begins by helping you understand how advanced ML algorithms work and shows you practical examples of how they can be applied to security-specific problems with Python – by using open source datasets or instructing you to create your own. In one exercise, you'll also use GPT 3.5, the secret sauce behind ChatGPT, to generate an artificial dataset of fabricated news. Later, you'll find out how to apply the expert knowledge and human-in-the-loop decision-making that is necessary in the cybersecurity space. This book is designed to address the lack of proper resources available for individuals interested in transitioning into a data scientist role in cybersecurity. It concludes with case studies, interview questions, and blueprints for four projects that you can use to enhance your portfolio.
By the end of this book, you'll be able to apply machine learning algorithms to detect malware, fake news, deep fakes, and more, along with implementing privacy-preserving machine learning techniques such as differentially private ML.
This book is for machine learning practitioners interested in applying their skills to solve cybersecurity issues. Cybersecurity workers looking to leverage ML methods will also find this book useful. An understanding of the fundamental machine learning concepts and beginner-level knowledge of Python programming are needed to grasp the concepts in this book. Whether you're a beginner or an experienced professional, this book offers a unique and valuable learning experience that'll help you develop the skills needed to protect your network and data against the ever-evolving threat landscape.