Starting Data Analytics with Generative AI and Python

Starting Data Analytics with Generative AI and Python

Starting Data Analytics with Generative AI and Python
Автор: Guja Artur, Siwiak Marian, Siwiak Marlena
Дата выхода: 2025
Издательство: Manning Publications Co.
Количество страниц: 362
Размер файла: 6.8 MB
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы  Дополнительные материалы 

Starting Data Analytics with Generative AI and Python....1

contents....6

foreword....11

preface....13

acknowledgments....14

about this book....16

Who should read this book....17

How to read this book....17

The business data used in the book....19

About the code....19

liveBook discussion forum....20

about the authors....21

about the cover illustration....22

1 Introduction to the use of generative AI in data analytics....23

1.1 Inherent limitations of generative AI models....24

1.2 The role of generative AIs in data analytics....27

1.2.1 Generative AI in the data analytics flow....27

1.2.2 The complementarity of language models and other data analytics tools....31

1.2.3 Limits of generative AIs’ ability to automate and streamline data analytics processes....32

1.3 Getting started with generative AIs for data analytics....34

1.3.1 Web interface....34

1.3.2 Beware of tokens....34

1.3.3 Accessing and using the API....35

1.3.4 Third-party integrations of generative AI models....40

1.3.5 Running LLMs locally....41

1.3.6 Best practices and tips for successful generative AI implementation....42

Summary....44

2 Using generative AI to ensure sufficient data quality....45

2.1 On a whimsy of fortune....46

2.2 A note on best practices....47

2.3 Getting started....48

2.4 Quality assessment structure....55

2.4.1 Data cleaning steps....56

2.4.2 Exploratory data analysis elements....56

2.5 Data cleaning....57

2.5.1 Removing duplicates....58

2.5.2 Handling missing values....59

2.5.3 Correcting data entry errors....62

2.5.4 Data validation....63

2.6 Exploratory data analysis....65

2.6.1 Reviewing score distribution....67

2.6.2 Time series exploration....71

2.6.3 Mysterious variable investigation....77

2.6.4 Harmonizing data....80

Summary....83

3 Descriptive analysis and statistical inference supported by generative AI....84

3.1 Research questions....85

3.2 Analysis design....88

3.3 Descriptive data analysis....90

3.3.1 Popularity of product categories....91

3.3.2 Performance of products in their categories and regions....98

3.3.3 Review scores distribution....106

3.3.4 Order status....113

3.4 Inferential analysis....119

3.4.1 Before you begin....119

3.4.2 Relationship between product attributes and shipping costs....120

3.4.3 Relationship between product, transaction, shipping attributes, and the review score....127

3.4.4 Differences in sales performance and customer satisfaction between sellers....133

Summary....140

4 Using generative AI for result interpretations....141

4.1 Problem definition....142

4.2 Popularity of product categories....144

4.3 Performance of products in their categories and regions....152

4.4 Review scores distribution analysis....154

4.5 Order status....160

4.6 Relationship between product attributes and the shipping costs....164

4.7 Relationship between product, transaction, shipping attributes, and the review score....168

4.8 Differences in sales performance and customer satisfaction between sellers....172

Summary....175

5 Basic text mining using generative AI....176

5.1 Text mining in the era of generative AI....177

5.1.1 Generative AI is a game changer....178

5.1.2 Beware of AI intimidation....178

5.1.3 Unpacking the constraints....179

5.2 Preparing for analysis....181

5.2.1 Data quality....181

5.2.2 Customer feedback preparation example....182

5.3 Frequency analysis....184

5.3.1 What can we learn from frequency analysis of customer reviews?....184

5.3.2 Direct frequency analysis with generative AI....185

5.3.3 Uploading a data file to ChatGPT for frequency analysis....186

5.3.4 Extracting the most common words....187

5.3.5 Extracting the most common phrases....191

5.3.6 Understanding the output....192

5.4 Co-occurrence analysis....195

5.4.1 What can we learn from co-occurrence analysis?....196

5.4.2 Co-occurrence analysis in practice....197

5.4.3 Understanding the output....200

5.5 Keyword search....202

5.5.1 What can we learn from keyword search?....202

5.5.2 Generating keywords with generative AI....203

5.5.3 Generating keywords in practice....204

5.5.4 Searching for keywords....208

5.5.5 Improving keyword search....209

5.5.6 Comparing generative AIs: Code snippets for positive review searches....216

5.5.7 Seeking analytical inspiration....222

5.6 Dictionary-based methods....228

5.6.1 What can we learn from dictionary-based methods?....228

5.6.2 Finding resources....230

5.6.3 Interpreting resources....233

5.6.4 Adapting the code to chosen resources....234

5.6.5 Improving dictionary-based search....236

Summary....238

6 Advanced text mining with generative AI....240

6.1 Review analysis....241

6.2 Sentiment analysis....242

6.2.1 What can you learn from sentiment analysis?....244

6.2.2 Direct sentiment analysis with generative AIs....245

6.2.3 Sentiment analysis with generative AI’s API....247

6.2.4 Sentiment analysis with machine learning....249

6.2.5 Sentiment analysis with a suboptimal model....253

6.2.6 Sentiment analysis on translated inputs....256

6.2.7 Sentiment analysis with multilingual models....257

6.2.8 Sentiment analysis with zero-shot learning models....258

6.2.9 Comparing results of advanced sentiment analysis....259

6.3 Text summarization....260

6.3.1 How can you benefit from text summarization?....261

6.3.2 How can generative AI help in text summarization?....262

6.3.3 Summarizing text with ChatGPT....264

6.3.4 Summarizing text with dedicated libraries....265

6.3.5 Topic modeling....269

Summary....272

7 Scaling and performance optimization....273

7.1 Performance measurement....275

7.1.1 Execution time....277

7.1.2 Throughput....280

7.1.3 Resource utilization....281

7.2 Improving code performance....283

7.2.1 Optimizing code....284

7.2.2 Scaling code....286

7.3 Cloud-based deployment....293

7.3.1 What is cloud computing?....293

7.3.2 Moving your code to the cloud....294

7.4 Code conversion....296

Summary....301

8 Risk, mitigation, and tradeoffs....302

8.1 The risks of GenAI, in context....304

8.2 General best practices....305

8.2.1 AI use policy....305

8.2.2 Encouraging transparency and accountability....307

8.2.3 Educating stakeholders....308

8.2.4 Validating model outputs with expert knowledge....311

8.3 AI delusion and hallucination risks....313

8.4 Mitigating misinterpretation and miscommunication risks....318

8.4.1 Ensuring contextual understanding....319

8.4.2 Tailoring model prompts and iterative query refinement....319

8.4.3 Implementing post-processing techniques....321

8.4.4 Implementing best practices for clearly communicating results....323

8.4.5 Establishing a feedback loop....324

8.5 Model bias and fairness risks....325

8.5.1 Recognizing and identifying bias in model outputs....326

8.5.2 Applying bias detection and mitigation techniques....326

8.5.3 Encouraging diversity and ethical use of generative AIs....328

8.5.4 Continuously monitoring and updating models....328

8.6 Privacy and security risks....328

8.6.1 Identifying sensitive data....329

8.6.2 Data anonymization and pseudonymization....331

8.6.3 Social engineering and phishing....332

8.6.4 Compliance with data protection regulations....333

8.6.5 Regular security audits and assessments....333

8.6.6 Employee training and awareness....334

8.7 Legal and compliance risks....334

8.7.1 Understanding applicable regulations....334

8.7.2 Intellectual property and licensing....334

8.7.3 Transparency and explainability....335

8.7.4 Establishing a compliance framework....335

8.7.5 Regularly reviewing and updating compliance practices....335

8.8 Emergent risks....335

8.8.1 Rogue models....336

8.8.2 Vulnerable crown jewels....336

8.8.3 Unknown unknowns....337

Summary....338

appendix A Specifying multiple DataFrames to ChatGPT v4....339

A.1 Conversation recorded on April 1, 2023....339

appendix B On debugging ChatGPT’s code....344

B.1 Conversation recorded on April 3, 2023....344

appendix C On laziness and human errors....351

C.1 Conversation recorded on April 7, 2023....351

index....355

A....355

B....355

C....355

D....356

E....356

F....357

G....357

H....357

I....357

J....357

K....357

L....357

M....357

N....358

O....358

P....358

Q....358

R....359

S....359

T....359

U....360

V....360

W....360

Z....360

Starting Data Analytics with Generative AI and Python - back....362

Whether you're a data novice or an experienced pro looking to do more work, faster, Starting Data Analytics with Generative AI and Python is here to help simplify and speed up your data analysis! Written by a pair of world-class data scientists and an experienced risk manager, the book concentrates on the practical analytics tasks you'll do every day.

Inside Starting Data Analytics with Generative AI and Python you’ll learn how to:

  • Write great prompts for ChatGPT
  • Perform end-to-end descriptive analytics
  • Set up an AI-friendly data analytics environment
  • Evaluate the quality of your data
  • Develop a strategic analysis plan
  • Generate code to analyze non-text data
  • Explore text data directly with ChatGPT
  • Prepare reliable reports

In Starting Data Analytics with Generative AI and Python you’ll learn how to improve your coding efficiency, generate new analytical approaches, and fine-tune data pipelines—all assisted by AI tools like ChatGPT. For each step in the data process, you’ll discover how ChatGPT can implement data techniques from simple plain-English prompts. Plus, you’ll develop a vital intuition about the risks and errors that still come with these tools.

About the technology

If you have basic knowledge of data analysis, this book will show you how to use ChatGPT to accelerate your essential data analytics work. This speed-up can be amazing: the authors report needing one third or even one quarter the time they needed before.

About the book

You’ll find reliable and practical advice that works on the job. Improve problem exploration, generate new analytical approaches, and fine-tune your data pipelines—all while developing an intuition about the risks and errors that still come with AI tools. In the end, you’ll be able to do significantly more work, do it faster, and get better results, without breaking a sweat.Assuming only that you know the foundations, this friendly book guides you through the entire analysis process—from gathering and preparing raw data, data cleaning, generating code-based solutions, selecting statistical tools, and finally creating effective data presentations. With clearly-explained prompts to extract, interpret, and present data, it will raise your skills to a whole different level.

What's inside

  • Write great prompts for ChatGPT
  • Perform end-to-end descriptive analytics
  • Set up an AI-friendly data analytics environment
  • Evaluate the quality of your data
  • Develop a strategic analysis plan
  • Generate code to analyze non-text data
  • Explore text data directly with ChatGPT
  • Prepare reliable reports

Похожее:

Список отзывов:

Нет отзывов к книге.