1 Data, StatsandStories–AnIntroduction....1
1.1 FromSmalltoBigData....2
1.2 Numbers, FactsandStats....10
1.3 A SampledHistoryofStatistics....14
1.4 Statistics Today....22
1.5 Asking QuestionsandGettingAnswers....25
1.6 PresentingAnswersVisually....30
2 Python ProgrammingPrimer....33
2.1 TalkingtoPython....35
2.1.1 Scripting andInteracting....38
2.1.2 Jupyter Notebook....41
2.2 Starting UpwithPython....42
2.2.1 TypesinPython....43
2.2.2 Numbers: IntegersandFloats....43
2.2.3 Strings....46
2.2.4 Complex Numbers....49
2.3 Collections inPython....51
2.3.1 Lists....52
2.3.2 List Comprehension....60
2.3.3 Tuples....61
2.3.4 Dictionaries....66
2.3.5 Sets....72
2.4 The BeginningofWisdom:Logic&ControlFlow....80
2.4.1 Booleans andLogicalOperators....80
2.4.2 Conditional Statements....82
2.4.3 While Loop....85
2.4.4 For Loop....87
2.5 Functions....89
2.6 Scripts andModules....94
3 Snakes, Bears&OtherNumericalBeasts:NumPy,SciPy&pandas
99
3.1 Numerical Python–NumPy....100
3.1.1 Matrices andVectors....101
3.1.2 N-Dimensional Arrays....102
statisticsanddatavisualisationwithpython ix
3.1.3 N-Dimensional Matrices....104
3.1.4 Indexing andSlicing....107
3.1.5 Descriptive Statistics....109
3.2 Scientific Python–SciPy....112
3.2.1 Matrix Algebra....114
3.2.2 Numerical Integration....116
3.2.3 Numerical Optimisation....117
3.2.4 Statistics....118
3.3 Panel Data=pandas....121
3.3.1 Series andDataframes....122
3.3.2 Data Explorationwithpandas....124
3.3.3 Pandas DataTypes....125
3.3.4 Data Manipulationwithpandas....126
3.3.5 Loading Datatopandas....130
3.3.6 Data Grouping....136
4 The MeasureofAllThings–Statistics....141
4.1 Descriptive Statistics....144
4.2 MeasuresofCentralTendencyandDispersion....145
4.3 Central Tendency....146
4.3.1 Mode....147
4.3.2 Median....150
4.3.3 Arithmetic Mean....152
4.3.4 Geometric Mean....155
4.3.5 Harmonic Mean....159
4.4 Dispersion....163
4.4.1 Setting theBoundaries:Range....163
4.4.2 Splitting One’sSides:Quantiles,Quartiles,PercentilesandMore....166
4.4.3 Mean Deviation....169
4.4.4 VarianceandStandardDeviation....171
4.5 Data Description–DescriptiveStatisticsRevisited....176
5 Definitely Maybe:ProbabilityandDistributions....179
5.1 Probability....180
5.2 Random VariablesandProbabilityDistributions....182
5.2.1 Random Variables....183
5.2.2 DiscreteandContinuousDistributions....185
5.2.3 Expected ValueandVariance....186
5.3 DiscreteProbabilityDistributions....191
5.3.1 Uniform Distribution....191
5.3.2 Bernoulli Distribution....197
5.3.3 Binomial Distribution....201
5.3.4 HypergeometricDistribution....208
5.3.5 Poisson Distribution....216
statisticsanddatavisualisationwithpython xi
5.4 Continuous ProbabilityDistributions....223
5.4.1 Normal orGaussianDistribution....224
5.4.2 Standard NormalDistributionZ....235
5.4.3 Shape andMomentsofaDistribution....238
5.4.4 The CentralLimitTheorem....245
5.5 Hypothesis andConfidenceIntervals....247
5.5.1 Student’stDistribution....253
5.5.2 Chi-squaredDistribution....260
6 Alluring ArgumentsandUglyFacts–StatisticalModellingand
Hypothesis Testing....267
6.1 Hypothesis Testing....268
6.1.1 TalesandTails:One-andTwo-TailedTests....273
6.2 Normality Testing....279
6.2.1 Q-Q Plot....280
6.2.2 Shapiro-WilkTest....282
6.2.3 D’Agostino K-squaredTest....285
6.2.4 Kolmogorov-SmirnovTest....288
6.3 Chi-squareTest....291
6.3.1 Goodness ofFit....291
6.3.2 Independence....293
6.4 Linear CorrelationandRegression....296
6.4.1 Pearson Correlation....296
6.4.2 Linear Regression....301
6.4.3 Spearman Correlation....308
6.5 Hypothesis TestingwithOneSample....312
6.5.1 One-Sample t-testforthePopulationMean....312
6.5.2 One-Sample z-testforProportions....316
6.5.3 WilcoxonSignedRankwithOne-Sample....320
6.6 Hypothesis TestingwithTwoSamples....324
6.6.1 Two-Samplet-test–ComparingMeans,SameVariances....325
6.6.2 Levene’sTest–TestingHomoscedasticity....330
6.6.3 Welch’st-test–ComparingMeans,DifferentVariances....332
6.6.4 Mann-Whitney Test–TestingNon-normalSamples....334
6.6.5 PairedSamplet-test....338
6.6.6 WilcoxonMatchedPairs....342
6.7 Analysis ofVariance....345
6.7.1 One-factor orOne-wayANOVA....347
6.7.2 Tukey’sRangeTest....360
6.7.3 Repeated MeasuresANOVA....361
6.7.4 Kruskal-Wallis–Non-parametricOne-wayANOVA....365
6.7.5 Two-factororTwo-wayANOVA....369
statisticsanddatavisualisationwithpython xiii
6.8 TestsasLinearModels....376
6.8.1 Pearson andSpearmanCorrelations....377
6.8.2 One-sample t-andWilcoxonSignedRankTests....378
6.8.3 Two-Samplet-andMann-WhitneyTests....379
6.8.4 PairedSamplet-andWilcoxonMatchedPairsTests....380
6.8.5 One-way ANOVAandKruskal-WallisTest....380
7 Delightful Details–DataVisualisation....383
7.1 PresentingStatisticalQuantities....384
7.1.1 TextualPresentation....385
7.1.2 TabularPresentation....385
7.1.3 Graphical Presentation....386
7.2 Can YouDrawMeaPicture?–DataVisualisation....387
7.3 Design andVisualRepresentation....394
7.4 Plotting andVisualising:Matplotlib....402
7.4.1 Keep ItSimple:PlottingFunctions....403
7.4.2 Line StylesandColours....404
7.4.3 TitlesandLabels....405
7.4.4 Grids....406
7.5 Multiple Plots....407
7.6 Subplots....407
7.7 Plotting Surfaces....410
7.8 Data Visualisation–BestPractices....414
8 Dazzling DataDesigns–CreatingCharts....417
8.1 What IstheRightVisualisatonforMe?....417
8.2 Data VisualisationandPython....420
8.2.1 Data VisualisationwithPandas....421
8.2.2 Seaborn....423
8.2.3 Bokeh....425
8.2.4 Plotly....428
8.3 Scatter Plot....430
8.4 Line Chart....438
8.5 Bar Chart....440
8.6 Pie Chart....447
8.7 Histogram....452
8.8 Box Plot....459
8.9 AreaChart....464
8.10 Heatmap....468
A Variance:PopulationvSample....477
B SumofFirstnIntegers....479
C SumofSquaresoftheFirstnIntegers....481
statisticsanddatavisualisationwithpython xv
D TheBinomialCoefficient....483
D.1 Some UsefulPropertiesoftheBinomialCoefficient....484
E TheHypergeometricDistribution....485
E.1 The HypergeometricvsBinomialDistribution....485
F ThePoissonDistribution....487
F.1 Derivation ofthePoissonDistribution....487
F.2 The PoissonDistributionasaLimitoftheBinomialDistribution....488
G TheNormalDistribution....491
G.1 Integrating thePDFoftheNormalDistribution....491
G.2 Maximum andInflectionPointsoftheNormalDistribution....493
H SkewnessandKurtosis....495
I Kruskal-WallisTest–NoTies....497
Bibliography....501
Index....511
This book is intended to serve as a bridge in statistics for graduates and business practitioners interested in using their skills in the area of data science and analytics as well as statistical analysis in general. On the one hand, the book is intended to be a refresher for readers who have taken some courses in statistics, but who have not necessarily used it in their day-to-day work. On the other hand, the material can be suitable for readers interested in the subject as a first encounter with statistical work in Python. Statistics and Data Visualisation with Python aims to build statistical knowledge from the ground up by enabling the reader to understand the ideas behind inferential statistics and begin to formulate hypotheses that form the foundations for the applications and algorithms in statistical analysis, business analytics, machine learning, and applied machine learning. This book begins with the basics of programming in Python and data analysis, to help construct a solid basis in statistical methods and hypothesis testing, which are useful in many modern applications.