Foundations of Statistical Natural Language Processing

Foundations of Statistical Natural Language Processing

Foundations of Statistical Natural Language Processing

Автор: Christopher David Manning , Hinrich Schütze
Дата выхода: 1999
Издательство: The MIT Press
Количество страниц: 720
Размер файла: 3,0 МБ
Тип файла: PDF
Добавил: codelibs
 Проверить на вирусы

List of Tables
List of Figures
Table of Notations
Preface
Road Map
I   Preliminaries
    1  Introduction
       1.1  Rationalist and Empiricist Approaches to Language
       1.2  Scientific Content
            1.2.1   Questions that linguistics should answer
            1.2.2   Non-categorical phenomena in language
            1.2.3   Language and cognition as probabilistic phenomena
       1.3  The Ambiguity of Language: Why NLP Is Difficult
       1.4  Dirty Hands
            1.4.1   Lexical resources
            1.4.2   Word counts
            1.4.3   Zipf’s laws
            1.4.4   Collocations
            1.4.5   Concordances
       1.5  Further Reading
       1.6  Exercises
    2  Mathematical Foundations
       2.1  Elementary Probability Theory
            2.1.1   Probability spaces
            2.1.2   Conditional probability and independence
            2.1.3   Bayes’ theorem
            2.1.4   Random variables
            2.1.5   Expectation and variance
            2.1.6   Notation
            2.1.7   Joint and conditional distributions
            2.1.8   Determining P
            2.1.9   Standard distributions
            2.1.10  Bayesian statistics
            2.1.11  Exercises
       2.2  Essential Information Theory
            2.2.1   Entropy
            2.2.2   Joint entropy and conditional entropy
            2.2.3   Mutual information
            2.2.4   The noisy channel model
            2.2.5   Relative entropy or Kullback-Leibler divergence
            2.2.6   The relation to language: Cross entropy
            2.2.7   The entropy of English
            2.2.8   Perplexity
            2.2.9   Exercises
       2.3  Further Reading
    3  Linguistic Essentials
       3.1  Parts of Speech and Morphology
            3.1.1   Nouns and pronouns
            3.1.2   Words that accompany nouns: Determiners and adjectives
            3.1.3   Verbs
            3.1.4   Other parts of speech
       3.2  Phrase Structure
            3.2.1   Phrase structure grammars
            3.2.2   Dependency: Arguments and adjuncts
            3.2.3   X’ theory
            3.2.4   Phrase structure ambiguity
       3.3  Semantics and Pragmatics
       3.4  Other Areas
       3.5  Further Reading
       3.6  Exercises
    4  Corpus-Based Work
       4.1  Getting Set Up
            4.1.1   Computers
            4.1.2   Corpora
            4.1.3   Software
       4.2  Looking at Text
            4.2.1   Low-level formatting issues
            4.2.2   Tokenization: What is a word?
            4.2.3   Morphology
            4.2.4   Sentences
       4.3  Marked-up Data
            4.3.1   Markup schemes
            4.3.2   Grammatical tagging
       4.4  Further Reading
       4.5  Exercises
II  Words
    5  Collocations
       5.1  Frequency
       5.2  Mean and Variance
       5.3  Hypothesis Testing
            5.3.1   The t test
            5.3.2   Hypothesis testing of differences
            5.3.3   Pearson’s chi-square test
            5.3.4   Likelihood ratios
       5.4  Mutual Information
       5.5  The Notion of Collocation
       5.6  Further Reading
    6  Statistical Inference: n-gram Models over Sparse Data
       6.1  Bins: Forming Equivalence Classes
            6.1.1   Reliability vs. discrimination
            6.1.2   n-gram models
            6.1.3   Building n-gram models
       6.2  Statistical Estimators
            6.2.1   Maximum Likelihood Estimation (MLE)
            6.2.2   Laplace’s law, Lidstone’s law and the Jeffreys-Perks law
            6.2.3   Held out estimation
            6.2.4   Cross-validation (deleted estimation)
            6.2.5   Good-Turing estimation
            6.2.6   Briefly noted
       6.3  Combining Estimators
            6.3.1   Simple linear interpolation
            6.3.2   Katz’s backing-off
            6.3.3   General linear interpolation
            6.3.4   Briefly noted
            6.3.5   Language models for Austen
       6.4  Conclusions
       6.5  Further Reading
       6.6  Exercises
    7  Word Sense Disambiguation
       7.1  Methodological Preliminaries
            7.1.1   Supervised and unsupervised learning
            7.1.2   Pseudowords
            7.1.3   Upper and lower bounds on performance
       7.2  Supervised Disambiguation
            7.2.1   Bayesian classification
            7.2.2   An information-theoretic approach
       7.3  Dictionary-Based Disambiguation
            7.3.1   Disambiguation based on sense definitions
            7.3.2   Thesaurus-based disambiguation
            7.3.3   Disambiguation based on translations in a second-language corpus
            7.3.4   One sense per discourse, one sense per collocation
       7.4  Unsupervised Disambiguation
       7.5  What Is a Word Sense?
       7.6  Further Reading
       7.7  Exercises
    8  Lexical Acquisition
       8.1  Evaluation Measures
       8.2  Verb Subcategorization
       8.3  Attachment Ambiguity
            8.3.1   Hindle and Rooth (1993)
            8.3.2   General remarks on PP attachment
       8.4  Selectional Preferences
       8.5  Semantic Similarity
            8.5.1   Vector space measures
            8.5.2   Probabilistic measures
       8.6  The Role of Lexical Acquisition in Statistical NLP
       8.7  Further Reading
III Grammar
    9  Markov Models
       9.1  Markov Models
       9.2  Hidden Markov Models
            9.2.1   Why use HMMs?
            9.2.2   General form of an HMM
       9.3  The Three Fundamental Questions for HMMs
            9.3.1   Finding the probability of an observation
            9.3.2   Finding the best state sequence
            9.3.3   The third problem: Parameter estimation
       9.4  HMMs: Implementation, Properties, and Variants
            9.4.1   Implementation
            9.4.2   Variants
            9.4.3   Multiple input observations
            9.4.4   Initialization of parameter values
       9.5  Further Reading
    10 Part-of-Speech Tagging
       10.1 The Information Sources in Tagging
       10.2 Markov Model Taggers
            10.2.1  The probabilistic model
            10.2.2  The Viterbi algorithm
            10.2.3  Variations
       10.3 Hidden Markov Model Taggers
            10.3.1  Applying HMMs to POS tagging
            10.3.2  The effect of initialization on HMM training
       10.4 Transformation-Based Learning of Tags
            10.4.1  Transformations
            10.4.2  The learning algorithm
            10.4.3  Relation to other models
            10.4.4  Automata
            10.4.5  Summary
       10.5 Other Methods, Other Languages
            10.5.1  Other approaches to tagging
            10.5.2  Languages other than English
       10.6 Tagging Accuracy and Uses of Taggers
            10.6.1  Tagging accuracy
            10.6.2  Applications of tagging
       10.7 Further Reading
       10.8 Exercises
    11 Probabilistic Context Free Grammars
       11.1 Some Features of PCFGs
       11.2 Questions for PCFGs
       11.3 The Probability of a String
            11.3.1  Using inside probabilities
            11.3.2  Using outside probabilities
            11.3.3  Finding the most likely parse for a sentence
            11.3.4  Training a PCFG
       11.4 Problems with the Inside-Outside Algorithm
       11.5 Further Reading
       11.6 Exercises
    12 Probabilistic Parsing
       12.1 Some Concepts
            12.1.1  Parsing for disambiguation
            12.1.2  Treebanks
            12.1.3  Parsing models vs. language models
            12.1.4  Weakening the independence assumptions of PCFGs
            12.1.5  Tree probabilities and derivational probabilities
            12.1.6  There’s more than one way to do it
            12.1.7  Phrase structure grammars and dependency grammars
            12.1.8  Evaluation
            12.1.9  Equivalent models
            12.1.10 Building parsers: Search methods
            12.1.11 Use of the geometric mean
       12.2 Some Approaches
            12.2.1  Non-lexicalized grammars treebank
            12.2.2  Lexicalized models using derivational histories
            12.2.3  Dependency-based models
            12.2.4  Discussion
       12.3 Further Reading
       12.4 Exercises
IV Applications and Techniques
    13 Statistical Alignment and Machine Translation
       13.1 Text Alignment
            13.1.1 Aligning sentences and paragraphs
            13.1.2 Length-based methods
            13.1.3 Offset alignment by signal processing techniques
            13.1.4 Lexical methods of sentence alignment
            13.1.5 Summary
            13.1.6 Exercises
       13.2 Word Alignment
       13.3 Statistical Machine Translation
       13.4 Further Reading
    14 Clustering
       14.1 Hierarchical Clustering
            14.1.1 Single-link and complete-link clustering
            14.1.2 Group-average agglomerative clustering
            14.1.3 An application: Improving a language model
            14.1.4 Top-down clustering
       14.2 Non-Hierarchical Clustering
            14.2.1 K-means
            14.2.2 The EM algorithm
       14.3 Further Reading
       14.4 Exercises
    15 Topics in Information Retrieval
       15.1 Some Background on Information Retrieval
            15.1.1 Common design features of IR systems
            15.1.2 Evaluation measures
            15.1.3 The probability ranking principle (PRP)
       15.2 The Vector Space Model
            15.2.1 Vector similarity
            15.2.2 Term weighting
       15.3 Term Distribution Models
            15.3.1 The Poisson distribution
            15.3.2 The two-Poisson model
            15.3.3 The K mixture
            15.3.4 Inverse document frequency
            15.3.5 Residual inverse document frequency
            15.3.6 Usage of term distribution models
       15.4 Latent Semantic Indexing
            15.4.1 Least-squares methods
            15.4.2 Singular Value Decomposition
            15.4.3 Latent Semantic Indexing in IR
       15.5 Discourse Segmentation
            15.5.1 TextTiling
       15.6 Further Reading
       15.7 Exercises
    16 Text Categorization
       16.1 Decision Trees
       16.2 Maximum Entropy Modeling
            16.2.1 Generalized iterative scaling
            16.2.2 Application to text categorization
       16.3 Perceptrons
       16.4 k Nearest Neighbor Classification
       16.5 Further Reading
Tiny Statistical Tables
Bibliography
Index

 Statistical approaches to processing natural language text have become dominant in recent years. This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear. The book contains all the theory and algorithms needed for building NLP tools. It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations. The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications.


Похожее:

Список отзывов:

Нет отзывов к книге.