Web Scraping With Python: Data Extraction from the Modern Web. 3 Ed

Name: Web Scraping With Python: Data Extraction from the Modern Web. 3 Ed
Author: Mitchell Ryan

Web Scraping With Python: Data Extraction from the Modern Web. 3 Ed

Автор: Mitchell Ryan

Дата выхода: 2024

Издательство: O’Reilly Media, Inc.

Количество страниц: 352

Размер файла: 2,1 МБ

Тип файла: PDF

Добавил: codelibs

Проверить на вирусы

Cover....1

Table of Contents....4

Preface....10

What Is Web Scraping?....10

Why Web Scraping?....11

About This Book....12

Conventions Used in This Book....13

Using Code Examples....14

O’Reilly Online Learning....15

How to Contact Us....15

Acknowledgments....16

Part I. Building Scrapers....18

Chapter 1. How the Internet Works....20

Networking....21

Physical Layer....22

Data Link Layer....22

Network Layer....23

Transport Layer....23

Session Layer....24

Presentation Layer....24

Application Layer....24

HTML....24

CSS....26

JavaScript....28

Watching Websites with Developer Tools....30

Chapter 2. The Legalities and Ethics of Web Scraping....34

Trademarks, Copyrights, Patents, Oh My!....34

Trespass to Chattels....38

The Computer Fraud and Abuse Act....40

robots.txt and Terms of Service....41

Three Web Scrapers....45

eBay v. Bidder’s Edge and Trespass to Chattels....45

United States v. Auernheimer and the Computer Fraud and Abuse Act....46

Field v. Google: Copyright and robots.txt....48

Chapter 3. Applications of Web Scraping....50

Classifying Projects....50

E-commerce....51

Marketing....52

Academic Research....53

Product Building....54

Travel....55

Sales....56

SERP Scraping....57

Chapter 4. Writing Your First Web Scraper....58

Installing and Using Jupyter....58

Connecting....60

An Introduction to BeautifulSoup....61

Installing BeautifulSoup....61

Running BeautifulSoup....63

Connecting Reliably and Handling Exceptions....66

Chapter 5. Advanced HTML Parsing....70

Another Serving of BeautifulSoup....70

find() and find_all() with BeautifulSoup....72

Other BeautifulSoup Objects....74

Navigating Trees....75

Regular Expressions....79

Regular Expressions and BeautifulSoup....83

Accessing Attributes....84

Lambda Expressions....85

You Don’t Always Need a Hammer....86

Chapter 6. Writing Web Crawlers....88

Traversing a Single Domain....88

Crawling an Entire Site....92

Collecting Data Across an Entire Site....95

Crawling Across the Internet....98

Chapter 7. Web Crawling Models....104

Planning and Defining Objects....105

Dealing with Different Website Layouts....108

Structuring Crawlers....113

Crawling Sites Through Search....113

Crawling Sites Through Links....116

Crawling Multiple Page Types....118

Thinking About Web Crawler Models....120

Chapter 8. Scrapy....122

Installing Scrapy....122

Initializing a New Spider....123

Writing a Simple Scraper....124

Spidering with Rules....125

Creating Items....130

Outputting Items....132

The Item Pipeline....133

Logging with Scrapy....136

More Resources....136

Chapter 9. Storing Data....138

Media Files....138

Storing Data to CSV....141

MySQL....143

Installing MySQL....144

Some Basic Commands....146

Integrating with Python....149

Database Techniques and Good Practice....152

“Six Degrees” in MySQL....154

Email....157

Part II. Advanced Scraping....160

Chapter 10. Reading Documents....162

Document Encoding....162

Text....163

Text Encoding and the Global Internet....164

CSV....168

Reading CSV Files....168

PDF....170

Microsoft Word and .docx....172

Chapter 11. Working with Dirty Data....176

Cleaning Text....177

Working with Normalized Text....181

Cleaning Data with Pandas....183

Cleaning....185

Indexing, Sorting, and Filtering....188

More About Pandas....189

Chapter 12. Reading and Writing Natural Languages....190

Summarizing Data....191

Markov Models....195

Six Degrees of Wikipedia: Conclusion....198

Natural Language Toolkit....201

Installation and Setup....201

Statistical Analysis with NLTK....202

Lexicographical Analysis with NLTK....205

Additional Resources....208

Chapter 13. Crawling Through Forms and Logins....210

Python Requests Library....210

Submitting a Basic Form....211

Radio Buttons, Checkboxes, and Other Inputs....214

Submitting Files and Images....215

Handling Logins and Cookies....216

HTTP Basic Access Authentication....217

Похожее:

Building Generative AI…

Визуализация данных с …

Make games with Python…

Introduction to Psycho…

RAG with Python Cookbo…

Building Large Languag…

Numeric Python: Python…

A Mathematical Introdu…

Applied math with Pyth…

Python 3 Using DeepSeek

Список отзывов:

Нет отзывов к книге.

Меню