我們產生和收集數據的能力正在快速增長。除了大多數商業、科學和政府事務的日益計算機化會產生數據之外,數碼相機、工具和條碼的廣泛應用也會產生數據。在數據收集方面,掃描的文體和圖像平臺、衛星遙感系統和國際互聯網已經使我們的生活被巨大的數據量所包圍。這種爆炸性的數據增長促使我們比以往更迫切地需要新技術和自動化工具來幫助我們將這些數據轉換為有用的信息和知識。
本書第1版曾被KDnuggets的讀者評選為受歡迎的數據挖掘專著,是一本可讀性極佳的教材。它從數據庫角度系統地介紹了數據挖掘的基本概念、基本方法和基本技術以及數據挖掘的技術研究進展,重點關注其可行性、有用性、有效性和可伸縮性問題。但是,自第1版出版之后,數據挖掘領域的研究又取得了很大的進展,開發出了新的數據挖掘方法、系統和應用。第2版在這一方面進行了加強,增加了多個章節講述的數據挖掘方法,以便能夠挖掘出復雜類型的數據,包括流數據、序列數據、圖結構數據、社群網絡數據和多重關系數據。
本書適合作為高等院校計算及相關專業高年級本科生的選修課教材,特別適合作為研究生的專業課教材,同時也可供從事數數據挖掘研究和應用開發工作的相關人員作為必備的參考書。
本書主要特點是:實用地論述了從實際業務數據中抽取出的讀者需要知道的概念和技術。更新并結合了來自讀者的反饋、數據挖掘領域的技術變化以及統計和機器學習方面的更多資料。包含了許多算法和實際示例,全部以易于理解的偽代碼編寫,適用于實際的大規模數據挖掘項目。
韓家煒,伊利諾伊大學厄巴納一尚佩恩分校計算機科學系教授。由于在數據挖掘和數據庫系統領域卓有成效的研究工作,他曾多次獲得各種榮譽和獎勵,其中包括2004年ACM SIGKDD頒發的創新獎。同時,他還是《ACM Trarlsactiorls on Krlowledge Discovery fronl Data》雜志的主編,以
Foreword vii
Preface ix
Chapter1 Introduction
1.1 What Motivated Data Mining? Why Is It Important?
1.2 So, What Is Data Mining?
1.3 Data Mining-On What Kind of Data?
1.3.1 Relational Databases
1.3.2 Data Warehouses
1.3.3 TransactionalDatabases
1.3.4 Advanced Data and Information Systems and Advanced Applications
1.4 Data Mining Functionalities---What Kinds of Patterns Can Be Mined?
1.4.1 Concept/Class Description: Characterization and Discrimination
1.4.2 Mining Frequent Patterns, Associations, and Correlations
1.4.3 Classification and Prediction 24 1.4.4 Cluster Analysis
1.4.5 Outlier Analysis 26 1.4.6 Evolution Analysis
1.5 Are All of the Patterns Interesting?
1.6 Classification of Data Mining Systems
1.7 Data Mining Task Primitives
1.8 Integration of a Data Mining System with a Database or Data Warehouse System
1.9 Major Issues in Data Mining
1.10 Summary
Exercises
Bibliographic Notes
Chapter2 Data Preprocessing
2.1 Why Preprocess the Data?
2.2 Descriptive Data Summarization
2.2.1 Measuring the Central Tendency
2.2.2 Measuring the Dispersion of Data
2.2.3 Graphic Displays of Basic Descriptive Data Summaries
2.3 Data Cleaning
2.3.1 Missing Values
2.3.2 Noisy Data
2.3.3 Data Cleaning as a Process
2.4 Data Integration and Transformation
2.4.1 Data Integration
2.4.2 Data Transformation
2.5 Data Reduction
2.5.1 Data Cube Aggregation
2.5.2 Attribute Subset Selection
2.5.3 DimensionalityReduction
2.5.4 Numerosity Reduction
2.6 Data Discretization and Concept Hierarchy Generation
2.6.1 Discretization and Concept Hierarchy Generation for Numerical Data
2.6.2 Concept Hierarchy Generation for Categorical Data
2.7 Summary 97 Exercises 97 Bibliographic Notes
Chapter3 Data Warehouse and OLAP Technology: An Overview
3.1 What Is a Data Warehouse?
3.1.1 Differences between Operational Database Systems and Data Warehouses
3.1.2 But, Why Have a Separate Data Warehouse?
3.2 A Multidimensional Data Model
3.2.1 From Tables and Spreadsheets to Data Cubes
3.2.2 Stars, Snowflakes, and Fact Constellations: Schemas for Multidimensional Databases
3.2.3 Examples for Defining Star, Snowflake, and Fact Constellation Schemas
……
Chapter4 Data Cube Computation and Data Generalization
Chapter5 Mining Frequent Patterns, Associations, and Correlations
Chapter6 Classification adn Predidction
Chapter7 Cluster Analysis
Chapter8 Mining Stream, Time-Series, and Sepuence Data
Chapter9 Graph Mining, Social Network Analysis, and Multirelational
Chapter10 Mining Object, Spatial, Multimedia, Test, and Wed Data
Chapter11 Applications and Trends in Data Mining
An Introduction to Microsoft's OLE DB for
Bibliography
Index