图书介绍

数据挖掘 概念与技术 英文版 原书第3版【2025|PDF下载-Epub版本|mobi电子书|kindle百度云盘下载】

数据挖掘 概念与技术 英文版 原书第3版
  • (美)韩家炜,(美)坎伯著 著
  • 出版社: 北京:机械工业出版社
  • ISBN:9787111374312
  • 出版时间:2012
  • 标注页数:703页
  • 文件大小:46MB
  • 文件页数:733页
  • 主题词:数据采集-英文

PDF下载


点此进入-本书在线PDF格式电子书下载【推荐-云解压-方便快捷】直接下载PDF格式图书。移动端-PC端通用
种子下载[BT下载速度快]温馨提示:(请使用BT下载软件FDM进行下载)软件下载地址页直链下载[便捷但速度慢]  [在线试读本书]   [在线获取解压码]

下载说明

数据挖掘 概念与技术 英文版 原书第3版PDF格式电子书版下载

下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。

建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!

(文件页数 要大于 标注页数,上中下等多册电子书除外)

注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具

图书目录

Chapter 1 Introduction1

1.1 Why Data Mining?1

1.1.1 Moving toward the Information Age1

1.1.2 Data Mining asthe Evolution of Information Technology2

1.2 What Is Data Mining?5

1.3 What Kinds of Data Can Be Mined?8

1.3.1 Database Data9

1.3.2 Data Warehouses10

1.3.3 Transactional Data13

1.3.4 Other Kinds of Data14

1.4 What Kinds of Patterns Can Be Mined?15

1.4.1 Class/Concept Description:Characterization and Discrimination15

1.4.2 Mining Frequent Patterns,Associations,and Correlations17

1.4.3 Classification and Regression for Predictive Analysis18

1.4.4 Cluster Analysis19

1.4.5 Outlier Analysis20

1.4.6 Are All Patterns Interesting?21

1.5 Which Technologies Are Used?23

1.5.1 Statistics23

1.5.2 Machine Learning24

1.5.3 Database Systems and Data Warehouses26

1.5.4 Information Retrieval26

1.6 Which Kinds of Applications Are Targeted?27

1.6.1 Business Intelligence27

1.6.2 Web Search Engines28

1.7 Major Issues in Data Mining29

1.7.1 Mining Methodology29

1.7.2 User Interaction30

1.7.3 Efficiency and Scalability31

1.7.4 Diversity of Database Types32

1.7.5 Data Miningand Society32

1.8 Summary33

1.9 Exercises34

1.10 Bibliographic Notes35

Chapter 2 Getting to Know Your Data39

2.1 Data Objects and Attribute Types40

2.1.1 What Is an Attribute?40

2.1.2 Nominal Attributes41

2.1.3 Binary Attributes41

2.1.4 Ordinal Attributes42

2.1.5 Numeric Attributes43

2.1.6 Discrete versus Continuous Attributes44

2.2 Basic Statistical Descriptions of Data44

2.2.1 Measuring the Central Tendency:Mean,Median,and Mode45

2.2.2 Measuring the Dispersion of Data:Range,Quartiles,Variance,Standard Deviation,and Interquartile Range48

2.2.3 Graphic Displays of Basic Statistical Descriptions of Data51

2.3 Data Visualization56

2.3.1 Pixel-Oriented Visualization Techniques57

2.3.2 Geometric Projection Visualization Techniques58

2.3.3 Icon-Based Visualization Techniques60

2.3.4 Hierarchical Visualization Techniques63

2.3.5 Visualizing Complex Data and Relations64

2.4 Measuring Data Similarity and Dissimilarity65

2.4.1 Data Matrix versus Dissimilarity Matrix67

2.4.2 Proximity Measures for Nominal Attributes68

2.4.3 Proximity Measures for Binary Attributes70

2.4.4 Dissimilarity of Numeric Data:Minkowski Distance72

2.4.5 Proximity Measures for Ordinal Attributes74

2.4.6 Dissimilarity for Attributes of Mixed Types75

2.4.7 Cosine Similarity77

2.5 Summary79

2.6 Exercises79

2.7 Bibliographic Notes81

Chapter 3 Data Preprocessing83

3.1 Data Preprocessing:An Overview84

3.1.1 Data Quality:Why Preprocessthe Data?84

3.1.2 Major Tasks in Data Preprocessing85

3.2 Data Cleaning88

3.2.1 Missing Values88

3.2.2 Noisy Data89

3.2.3 Data Cleaning as a Process91

3.3 Data Integration93

3.3.1 Entity Identification Problem94

3.3.2 Redundancy and Correlation Analysis94

3.3.3 Tupie Duplication98

3.3.4 Data Value Conflict Detection and Resolution99

3.4 Data Reduction99

3.4.1 Overview of Data Reduction Strategies99

3.4.2 Wavelet Transforms100

3.4.3 Principal Components Analysis102

3.4.4 Attribute Subset Selection103

3.4.5 Regression and Log-Linear Models:Parametric Data Reduction105

3.4.6 Histograms106

3.4.7 Clustering108

3.4.8 Sampling108

3.4.9 Data Cube Aggregation110

3.5 Data Transformation and Data Discretization111

3.5.1 Data Transformation Strategies Overview112

3.5.2 Data Transformation by Normalization113

3.5.3 Discretization by Binning115

3.5.4 Discretization by Histogram Analysis115

3.5.5 Discretization by Cluster,Decision Tree,and Correlation Analyses116

3.5.6 Concept Hierarchy Generation for Nominal Data117

3.6 Summary120

3.7 Exercises121

3.8 Bibliographic Notes123

Chapter 4 Data Warehousing and Online Analytical Processing125

4.1 Data Warehouse:Basic Concepts125

4.1.1 What Is a Data Warehouse?126

4.1.2 Differences between Operational Database Systems and Data Warehouses128

4.1.3 But,Why Have a Separate Data Warehouse?129

4.1.4 Data Warehousing:A Multitiered Architecture130

4.1.5 Data Warehouse Models:Enterprise Warehouse,Data Mart,and Virtual Warehouse132

4.1.6 Extraction,Transformation,and Loading134

4.1.7 Metadata Repository134

4.2 Data Warehouse Modeling:Data Cube and OLAP135

4.2.1 Data Cube:A Multidimensional Data Model136

4.2.2 Stars,Snowflakes,and Fact Constellations:Schemas for Multidimensional Data Models139

4.2.3 Dimensions:The Role of Concept Hierarchies142

4.2.4 Measures:Their Categorization and Computation144

4.2.5 Typical OLAP Operations146

4.2.6 A Starnet Query Model for Querying Multidimensional Databases149

4.3 Data Warehouse Design and Usage150

4.3.1 A Business Analysis Framework for Data Warehouse Design150

4.3.2 Data Warehouse Design Process151

4.3.3 Data Warehouse Usage for Information Processing153

4.3.4 From Online Analytical Processing to Multidimensional Data Mining155

4.4 Data Warehouse Implementation156

4.4.1 Efficient Data Cube Computation:An Overview156

4.4.2 Indexing OLAP Data:Bitmap Index and Join Index160

4.4.3 Efficient Processing of OLAP Queries163

4.4.4 OLAP Server Architectures:ROLAP versus MOLAP versus HOLAP164

4.5 Data Generalization by Attribute-Oriented Induction166

4.5.1 Attribute-Oriented Induction for Data Characterization167

4.5.2 Efficient Implementation of Attribute-Oriented Induction172

4.5.3 Attribute-Oriented Induction for Class Comparisons175

4.6 Summary178

4.7 Exercises180

4.8 Bibliographic Notes184

Chapter 5 Data Cube Technology187

5.1 Data Cube Computation:Preliminary Concepts188

5.1.1 Cube Materialization:Full Cube,Iceberg Cube,Closed Cube,and Cube Shell188

5.1.2 General Strategies for Data Cube Computation192

5.2 Data Cube Computation Methods194

5.2.1 Multiway Array Aggregation for Full Cube Computation195

5.2.2 BUC:Computing Iceberg Cubes from the Apex Cuboid Downward200

5.2.3 Star-Cubing:Computing Iceberg Cubes Using a Dynamic Star-Tree Structure204

5.2.4 Precomputing Shell Fragments for Fast High-Dimensional OLAP210

5.3 Processing Advanced Kinds of Queries by Exploring Cube Technology218

5.3.1 Sampling Cubes:OLAP-Based Mining on Sampling Data218

5.3.2 Ranking Cubes:Efficient Computation of Top-k Queries225

5.4 Multidimensional Data Analysis in Cube Space227

5.4.1 Prediction Cubes:Prediction Mining in Cube Space227

5.4.2 Multifeature Cubes:Complex Aggregation at Multiple Granularities230

5.4.3 Exception-Based,Discovery-Driven Cube Space Exploration231

5.5 Summary234

5.6 Exercises235

5.7 Bibliographic Notes240

Chapter 6 Mining Frequent Patterns,Associations,and Correlations:Basic Concepts and Methods243

6.1 Basic Concepts243

6.1.1 Market Basket Analysis:A Motivating Example244

6.1.2 Frequent Itemsets,Closed Itemsets,and Association Rules246

6.2 Frequent Itemset Mining Methods248

6.2.1 Apriori Algorithm:Finding Frequent Itemsets by Confined Candidate Generation248

6.2.2 Generating Association Rules from Frequent Itemsets254

6.2.3 Improving the Efficiency of Apriori254

6.2.4 A Pattern-Growth Approach for Mining Frequent Itemsets257

6.2.5 Mining Frequent Itemsets Using Vertical Data Format259

6.2.6 Mining Closed and Max Patterns262

6.3 Which Patterns Are Interesting?—Pattern Evaluation Methods264

6.3.1 Strong Rules Are Not Necessarily Interesting264

6.3.2 From Association Analysis to Correlation Analysis265

6.3.3 A Comparison of Pattern Evaluation Measures267

6.4 Summary271

6.5 Exercises273

6.6 Bibliographic Notes276

Chapter 7 Advanced Pattern Mining279

7.1 Pattern Mining:A Road Map279

7.2 Pattern Mining in Multilevel,Multidimensional Space283

7.2.1 Mining Multilevel Associations283

7.2.2 Mining Multidimensional Associations287

7.2.3 Mining Quantitative Association Rules289

7.2.4 Mining Rare Patterns and Negative Patterns291

7.3 Constraint-Based Frequent Pattern Mining294

7.3.1 Metarule-Guided Mining of Association Rules295

7.3.2 Constraint-Based Pattern Generation:Pruning Pattern Space and Pruning Data Space296

7.4 Mining High-Dimensional Data and Colossal Patterns301

7.4.1 Mining Colossal Patterns by Pattern-Fusion302

7.5 Mining Compressed or Approximate Patterns307

7.5.1 Mining Compressed Patterns by Pattern Clustering308

7.5.2 Extracting Redundancy-Aware Top-k Patterns310

7.6 Pattern Exploration and Application313

7.6.1 Semantic Annotation of Frequent Patterns313

7.6.2 Applications of Pattern Mining317

7.7 Summary319

7.8 Exercises321

7.9 Bibliographic Notes323

Chapter 8 Classification:Basic Concepts327

8.1 Basic Concepts327

8.1.1 What Is Classification?327

8.1.2 General Approach to Classification328

8.2 Decision Tree Induction330

8.2.1 Decision Tree Induction332

8.2.2 Attribute Selection Measures336

8.2.3 Tree Pruning344

8.2.4 Scalability and Decision Tree Induction347

8.2.5 Visual Mining for Decision Tree Induction348

8.3 Bayes Classification Methods350

8.3.1 Bayes’ Theorem350

8.3.2 Na?ve Bayesian Classification351

8.4 Rule-Based Classification355

8.4.1 Using IF-THEN Rules for Classification355

8.4.2 Rule Extraction from a Decision Tree357

8.4.3 Rule Induction Using a Sequential Covering Algorithm359

8.5 Model Evaluation and Selection364

8.5.1 Metrics for Evaluating Classifier Performance364

8.5.2 Holdout Method and Random Subsampling370

8.5.3 Cross-Validation370

8.5.4 Bootstrap371

8.5.5 Model Selection Using Statistical Tests of Significance372

8.5.6 Comparing Classifiers Based on Cost-Benefit and ROC Curves373

8.6 Techniques to Improve Classification Accuracy377

8.6.1 Introducing Ensemble Methods378

8.6.2 Bagging379

8.6.3 Boosting and AdaBoost380

8.6.4 Random Forests382

8.6.5 Improving Classification Accuracy of Class-Imbalanced Data383

8.7 Summary385

8.8 Exercises386

8.9 Bibliographic Notes389

Chapter 9 Classification:Advanced Methods393

9.1 Bayesian Belief Networks393

9.1.1 Concepts and Mechanisms394

9.1.2 Training Bayesian Belief Networks396

9.2 Classification by Backpropagation398

9.2.1 A Multilayer Feed-Forward Neural Network398

9.2.2 Defining a Network Topology400

9.2.3 Backpropagation400

9.2.4 Inside the Black Box:Backpropagation and Interpretability406

9.3 Support Vector Machines408

9.3.1 The Case When the Data Are Linearly Separable408

9.3.2 The Case When the Data Are Linearly Inseparable413

9.4 Classification Using Frequent Patterns415

9.4.1 Associative Classification416

9.4.2 Discriminative Frequent Pattern-Based Classification419

9.5 Lazy Learners(or Learning from Your Neighbors)422

9.5.1 k-Nearest-Neighbor Classifiers423

9.5.2 Case-Based Reasoning425

9.6 Other Classification Methods426

9.6.1 Genetic Algorithms426

9.6.2 Rough Set Approach427

9.6.3 Fuzzy Set Approaches428

9.7 Additional Topics Regarding Classification429

9.7.1 Multiclass Classification430

9.7.2 Semi-Supervised Classification432

9.7.3 Active Learning433

9.7.4 Transfer Learning434

9.8 Summary436

9.9 Exercises438

9.10 Bibliographic Notes439

Chapter 10 Cluster Analysis:Basic Concepts and Methods443

10.1 Cluster Analysis444

10.1.1 What Is Cluster Analysis?444

10.1.2 Requirements for Cluster Analysis445

10.1.3 Overview of Basic Clustering Methods448

10.2 Partitioning Methods451

10.2.1 k-Means:A Centroid-Based Technique451

10.2.2 k-Medoids:A Representative Object-Based Technique454

10.3 Hierarchical Methods457

10.3.1 Agglomerative versus Divisive Hierarchical Clustering459

10.3.2 Distance Measures in Algorithmic Methods461

10.3.3 BIRCH:Multiphase Hierarchical Clustering Using Clustering Feature Trees462

10.3.4 Chameleon:Multiphase Hierarchical Clustering Using Dynamic Modeling466

10.3.5 Probabilistic Hierarchical Clustering467

10.4 Density-Based Methods471

10.4.1 DBSCAN:Density-Based Clustering Based on Connected Regions with High Density471

10.4.2 OPTICS:Ordering Points to Identify the Clustering Structure473

10.4.3 DENCLUE:Clustering Based on Density Distribution Functions476

10.5 Grid-Based Methods479

10.5.1 STING:STatistical INformation Grid479

10.5.2 CLIQUE:An Apriori-like Subspace Clustering Method481

10.6 Evaluation of Clustering483

10.6.1 Assessing Clustering Tendency484

10.6.2 Determining the Number of Clusters486

10.6.3 Measuring Clustering Quality487

10.7 Summary490

10.8 Exercises491

10.9 Bibliographic Notes494

Chapter 11 Advanced Cluster Analysis497

11.1 Probabilistic Model-Based Clustering497

11.1.1 Fuzzy Clusters499

11.1.2 Probabilistic Model-Based Clusters501

11.1.3 Expectation-Maximization Algorithm505

11.2 Clustering High-Dimensional Data508

11.2.1 Clustering High-Dimensional Data:Problems,Challenges,and Major Methodologies508

11.2.2 Subspace Clustering Methods510

11.2.3 Biclustering512

11.2.4 Dimensionality Reduction Methods and Spectral Clustering519

11.3 Clustering Graph and Network Data522

11.3.1 Applications and Challenges523

11.3.2 Similarity Measures525

11.3.3 Graph Clustering Methods528

11.4 Clustering with Constraints532

11.4.1 Categorization of Constraints533

11.4.2 Methods for Clustering with Constraints535

11.5 Summary538

11.6 Exercises539

11.7 Bibliographic Notes540

Chapter 12 Outlier Detection543

12.1 Outliers and Outlier Analysis544

12.1.1 What Are Outliers?544

12.1.2 Types of Outliers545

12.1.3 Challenges of Outlier Detection548

12.2 Outlier Detection Methods549

12.2.1 Supervised,Semi-Supervised,and Unsupervised Methods549

12.2.2 Statistical Methods,Proximity-Based Methods,and Clustering-Based Methods551

12.3 Statistical Approaches553

12.3.1 Parametric Methods553

12.3.2 Nonparametric Methods558

12.4 Proximity-Based Approaches560

12.4.1 Distance-Based Outlier Detection and a Nested Loop Method561

12.4.2 A Grid-Based Method562

12.4.3 Density-Based Outlier Detection564

12.5 Clustering-Based Approaches567

12.6 Classification-Based Approaches571

12.7 Mining Contextual and Collective Outliers573

12.7.1 Transforming Contextual Outlier Detection to Conventional Outlier Detection573

12.7.2 Modeling Normal Behavior with Respect to Contexts574

12.7.3 Mining Collective Outliers575

12.8 Outlier Detection in High-Dimensional Data576

12.8.1 Extending Conventional Outlier Detection577

12.8.2 Finding Outliers in Subspaces578

12.8.3 Modeling High-Dimensional Outliers579

12.9 Summary581

12.10 Exercises582

12.11 Bibliographic Notes583

Chapter 13 Data Mining Trends and Research Frontiers585

13.1 Mining Complex Data Types585

13.1.1 Mining Sequence Data:Time-Series,Symbolic Sequences,and Biological Sequences586

13.1.2 Mining Graphs and Networks591

13.1.3 Mining Other Kinds of Data595

13.2 Other Methodologies of Data Mining598

13.2.1 Statistical Data Mining598

13.2.2 Views on Data Mining Foundations600

13.2.3 Visual and Audio Data Mining602

13.3 Data Mining Applications607

13.3.1 Data Mining for Financial Data Analysis607

13.3.2 Data Mining for Retail and Telecommunication Industries609

13.3.3 Data Mining in Science and Engineering611

13.3.4 Data Mining for Intrusion Detection and Prevention614

13.3.5 Data Mining and Recommender Systems615

13.4 Data Miningand Society618

13.4.1 Ubiquitous and Invisible Data Mining618

13.4.2 Privacy,Security,and Social Impacts of Data Mining620

13.5 Data Mining Trends622

13.6 Summary625

13.7 Exercises626

13.8 Bibliographic Notes628

Bibliography633

Index673

热门推荐