图书介绍
数据挖掘 概念与技术 英文版 原书第3版【2025|PDF下载-Epub版本|mobi电子书|kindle百度云盘下载】

- (美)韩家炜,(美)坎伯著 著
- 出版社: 北京:机械工业出版社
- ISBN:9787111374312
- 出版时间:2012
- 标注页数:703页
- 文件大小:46MB
- 文件页数:733页
- 主题词:数据采集-英文
PDF下载
下载说明
数据挖掘 概念与技术 英文版 原书第3版PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
Chapter 1 Introduction1
1.1 Why Data Mining?1
1.1.1 Moving toward the Information Age1
1.1.2 Data Mining asthe Evolution of Information Technology2
1.2 What Is Data Mining?5
1.3 What Kinds of Data Can Be Mined?8
1.3.1 Database Data9
1.3.2 Data Warehouses10
1.3.3 Transactional Data13
1.3.4 Other Kinds of Data14
1.4 What Kinds of Patterns Can Be Mined?15
1.4.1 Class/Concept Description:Characterization and Discrimination15
1.4.2 Mining Frequent Patterns,Associations,and Correlations17
1.4.3 Classification and Regression for Predictive Analysis18
1.4.4 Cluster Analysis19
1.4.5 Outlier Analysis20
1.4.6 Are All Patterns Interesting?21
1.5 Which Technologies Are Used?23
1.5.1 Statistics23
1.5.2 Machine Learning24
1.5.3 Database Systems and Data Warehouses26
1.5.4 Information Retrieval26
1.6 Which Kinds of Applications Are Targeted?27
1.6.1 Business Intelligence27
1.6.2 Web Search Engines28
1.7 Major Issues in Data Mining29
1.7.1 Mining Methodology29
1.7.2 User Interaction30
1.7.3 Efficiency and Scalability31
1.7.4 Diversity of Database Types32
1.7.5 Data Miningand Society32
1.8 Summary33
1.9 Exercises34
1.10 Bibliographic Notes35
Chapter 2 Getting to Know Your Data39
2.1 Data Objects and Attribute Types40
2.1.1 What Is an Attribute?40
2.1.2 Nominal Attributes41
2.1.3 Binary Attributes41
2.1.4 Ordinal Attributes42
2.1.5 Numeric Attributes43
2.1.6 Discrete versus Continuous Attributes44
2.2 Basic Statistical Descriptions of Data44
2.2.1 Measuring the Central Tendency:Mean,Median,and Mode45
2.2.2 Measuring the Dispersion of Data:Range,Quartiles,Variance,Standard Deviation,and Interquartile Range48
2.2.3 Graphic Displays of Basic Statistical Descriptions of Data51
2.3 Data Visualization56
2.3.1 Pixel-Oriented Visualization Techniques57
2.3.2 Geometric Projection Visualization Techniques58
2.3.3 Icon-Based Visualization Techniques60
2.3.4 Hierarchical Visualization Techniques63
2.3.5 Visualizing Complex Data and Relations64
2.4 Measuring Data Similarity and Dissimilarity65
2.4.1 Data Matrix versus Dissimilarity Matrix67
2.4.2 Proximity Measures for Nominal Attributes68
2.4.3 Proximity Measures for Binary Attributes70
2.4.4 Dissimilarity of Numeric Data:Minkowski Distance72
2.4.5 Proximity Measures for Ordinal Attributes74
2.4.6 Dissimilarity for Attributes of Mixed Types75
2.4.7 Cosine Similarity77
2.5 Summary79
2.6 Exercises79
2.7 Bibliographic Notes81
Chapter 3 Data Preprocessing83
3.1 Data Preprocessing:An Overview84
3.1.1 Data Quality:Why Preprocessthe Data?84
3.1.2 Major Tasks in Data Preprocessing85
3.2 Data Cleaning88
3.2.1 Missing Values88
3.2.2 Noisy Data89
3.2.3 Data Cleaning as a Process91
3.3 Data Integration93
3.3.1 Entity Identification Problem94
3.3.2 Redundancy and Correlation Analysis94
3.3.3 Tupie Duplication98
3.3.4 Data Value Conflict Detection and Resolution99
3.4 Data Reduction99
3.4.1 Overview of Data Reduction Strategies99
3.4.2 Wavelet Transforms100
3.4.3 Principal Components Analysis102
3.4.4 Attribute Subset Selection103
3.4.5 Regression and Log-Linear Models:Parametric Data Reduction105
3.4.6 Histograms106
3.4.7 Clustering108
3.4.8 Sampling108
3.4.9 Data Cube Aggregation110
3.5 Data Transformation and Data Discretization111
3.5.1 Data Transformation Strategies Overview112
3.5.2 Data Transformation by Normalization113
3.5.3 Discretization by Binning115
3.5.4 Discretization by Histogram Analysis115
3.5.5 Discretization by Cluster,Decision Tree,and Correlation Analyses116
3.5.6 Concept Hierarchy Generation for Nominal Data117
3.6 Summary120
3.7 Exercises121
3.8 Bibliographic Notes123
Chapter 4 Data Warehousing and Online Analytical Processing125
4.1 Data Warehouse:Basic Concepts125
4.1.1 What Is a Data Warehouse?126
4.1.2 Differences between Operational Database Systems and Data Warehouses128
4.1.3 But,Why Have a Separate Data Warehouse?129
4.1.4 Data Warehousing:A Multitiered Architecture130
4.1.5 Data Warehouse Models:Enterprise Warehouse,Data Mart,and Virtual Warehouse132
4.1.6 Extraction,Transformation,and Loading134
4.1.7 Metadata Repository134
4.2 Data Warehouse Modeling:Data Cube and OLAP135
4.2.1 Data Cube:A Multidimensional Data Model136
4.2.2 Stars,Snowflakes,and Fact Constellations:Schemas for Multidimensional Data Models139
4.2.3 Dimensions:The Role of Concept Hierarchies142
4.2.4 Measures:Their Categorization and Computation144
4.2.5 Typical OLAP Operations146
4.2.6 A Starnet Query Model for Querying Multidimensional Databases149
4.3 Data Warehouse Design and Usage150
4.3.1 A Business Analysis Framework for Data Warehouse Design150
4.3.2 Data Warehouse Design Process151
4.3.3 Data Warehouse Usage for Information Processing153
4.3.4 From Online Analytical Processing to Multidimensional Data Mining155
4.4 Data Warehouse Implementation156
4.4.1 Efficient Data Cube Computation:An Overview156
4.4.2 Indexing OLAP Data:Bitmap Index and Join Index160
4.4.3 Efficient Processing of OLAP Queries163
4.4.4 OLAP Server Architectures:ROLAP versus MOLAP versus HOLAP164
4.5 Data Generalization by Attribute-Oriented Induction166
4.5.1 Attribute-Oriented Induction for Data Characterization167
4.5.2 Efficient Implementation of Attribute-Oriented Induction172
4.5.3 Attribute-Oriented Induction for Class Comparisons175
4.6 Summary178
4.7 Exercises180
4.8 Bibliographic Notes184
Chapter 5 Data Cube Technology187
5.1 Data Cube Computation:Preliminary Concepts188
5.1.1 Cube Materialization:Full Cube,Iceberg Cube,Closed Cube,and Cube Shell188
5.1.2 General Strategies for Data Cube Computation192
5.2 Data Cube Computation Methods194
5.2.1 Multiway Array Aggregation for Full Cube Computation195
5.2.2 BUC:Computing Iceberg Cubes from the Apex Cuboid Downward200
5.2.3 Star-Cubing:Computing Iceberg Cubes Using a Dynamic Star-Tree Structure204
5.2.4 Precomputing Shell Fragments for Fast High-Dimensional OLAP210
5.3 Processing Advanced Kinds of Queries by Exploring Cube Technology218
5.3.1 Sampling Cubes:OLAP-Based Mining on Sampling Data218
5.3.2 Ranking Cubes:Efficient Computation of Top-k Queries225
5.4 Multidimensional Data Analysis in Cube Space227
5.4.1 Prediction Cubes:Prediction Mining in Cube Space227
5.4.2 Multifeature Cubes:Complex Aggregation at Multiple Granularities230
5.4.3 Exception-Based,Discovery-Driven Cube Space Exploration231
5.5 Summary234
5.6 Exercises235
5.7 Bibliographic Notes240
Chapter 6 Mining Frequent Patterns,Associations,and Correlations:Basic Concepts and Methods243
6.1 Basic Concepts243
6.1.1 Market Basket Analysis:A Motivating Example244
6.1.2 Frequent Itemsets,Closed Itemsets,and Association Rules246
6.2 Frequent Itemset Mining Methods248
6.2.1 Apriori Algorithm:Finding Frequent Itemsets by Confined Candidate Generation248
6.2.2 Generating Association Rules from Frequent Itemsets254
6.2.3 Improving the Efficiency of Apriori254
6.2.4 A Pattern-Growth Approach for Mining Frequent Itemsets257
6.2.5 Mining Frequent Itemsets Using Vertical Data Format259
6.2.6 Mining Closed and Max Patterns262
6.3 Which Patterns Are Interesting?—Pattern Evaluation Methods264
6.3.1 Strong Rules Are Not Necessarily Interesting264
6.3.2 From Association Analysis to Correlation Analysis265
6.3.3 A Comparison of Pattern Evaluation Measures267
6.4 Summary271
6.5 Exercises273
6.6 Bibliographic Notes276
Chapter 7 Advanced Pattern Mining279
7.1 Pattern Mining:A Road Map279
7.2 Pattern Mining in Multilevel,Multidimensional Space283
7.2.1 Mining Multilevel Associations283
7.2.2 Mining Multidimensional Associations287
7.2.3 Mining Quantitative Association Rules289
7.2.4 Mining Rare Patterns and Negative Patterns291
7.3 Constraint-Based Frequent Pattern Mining294
7.3.1 Metarule-Guided Mining of Association Rules295
7.3.2 Constraint-Based Pattern Generation:Pruning Pattern Space and Pruning Data Space296
7.4 Mining High-Dimensional Data and Colossal Patterns301
7.4.1 Mining Colossal Patterns by Pattern-Fusion302
7.5 Mining Compressed or Approximate Patterns307
7.5.1 Mining Compressed Patterns by Pattern Clustering308
7.5.2 Extracting Redundancy-Aware Top-k Patterns310
7.6 Pattern Exploration and Application313
7.6.1 Semantic Annotation of Frequent Patterns313
7.6.2 Applications of Pattern Mining317
7.7 Summary319
7.8 Exercises321
7.9 Bibliographic Notes323
Chapter 8 Classification:Basic Concepts327
8.1 Basic Concepts327
8.1.1 What Is Classification?327
8.1.2 General Approach to Classification328
8.2 Decision Tree Induction330
8.2.1 Decision Tree Induction332
8.2.2 Attribute Selection Measures336
8.2.3 Tree Pruning344
8.2.4 Scalability and Decision Tree Induction347
8.2.5 Visual Mining for Decision Tree Induction348
8.3 Bayes Classification Methods350
8.3.1 Bayes’ Theorem350
8.3.2 Na?ve Bayesian Classification351
8.4 Rule-Based Classification355
8.4.1 Using IF-THEN Rules for Classification355
8.4.2 Rule Extraction from a Decision Tree357
8.4.3 Rule Induction Using a Sequential Covering Algorithm359
8.5 Model Evaluation and Selection364
8.5.1 Metrics for Evaluating Classifier Performance364
8.5.2 Holdout Method and Random Subsampling370
8.5.3 Cross-Validation370
8.5.4 Bootstrap371
8.5.5 Model Selection Using Statistical Tests of Significance372
8.5.6 Comparing Classifiers Based on Cost-Benefit and ROC Curves373
8.6 Techniques to Improve Classification Accuracy377
8.6.1 Introducing Ensemble Methods378
8.6.2 Bagging379
8.6.3 Boosting and AdaBoost380
8.6.4 Random Forests382
8.6.5 Improving Classification Accuracy of Class-Imbalanced Data383
8.7 Summary385
8.8 Exercises386
8.9 Bibliographic Notes389
Chapter 9 Classification:Advanced Methods393
9.1 Bayesian Belief Networks393
9.1.1 Concepts and Mechanisms394
9.1.2 Training Bayesian Belief Networks396
9.2 Classification by Backpropagation398
9.2.1 A Multilayer Feed-Forward Neural Network398
9.2.2 Defining a Network Topology400
9.2.3 Backpropagation400
9.2.4 Inside the Black Box:Backpropagation and Interpretability406
9.3 Support Vector Machines408
9.3.1 The Case When the Data Are Linearly Separable408
9.3.2 The Case When the Data Are Linearly Inseparable413
9.4 Classification Using Frequent Patterns415
9.4.1 Associative Classification416
9.4.2 Discriminative Frequent Pattern-Based Classification419
9.5 Lazy Learners(or Learning from Your Neighbors)422
9.5.1 k-Nearest-Neighbor Classifiers423
9.5.2 Case-Based Reasoning425
9.6 Other Classification Methods426
9.6.1 Genetic Algorithms426
9.6.2 Rough Set Approach427
9.6.3 Fuzzy Set Approaches428
9.7 Additional Topics Regarding Classification429
9.7.1 Multiclass Classification430
9.7.2 Semi-Supervised Classification432
9.7.3 Active Learning433
9.7.4 Transfer Learning434
9.8 Summary436
9.9 Exercises438
9.10 Bibliographic Notes439
Chapter 10 Cluster Analysis:Basic Concepts and Methods443
10.1 Cluster Analysis444
10.1.1 What Is Cluster Analysis?444
10.1.2 Requirements for Cluster Analysis445
10.1.3 Overview of Basic Clustering Methods448
10.2 Partitioning Methods451
10.2.1 k-Means:A Centroid-Based Technique451
10.2.2 k-Medoids:A Representative Object-Based Technique454
10.3 Hierarchical Methods457
10.3.1 Agglomerative versus Divisive Hierarchical Clustering459
10.3.2 Distance Measures in Algorithmic Methods461
10.3.3 BIRCH:Multiphase Hierarchical Clustering Using Clustering Feature Trees462
10.3.4 Chameleon:Multiphase Hierarchical Clustering Using Dynamic Modeling466
10.3.5 Probabilistic Hierarchical Clustering467
10.4 Density-Based Methods471
10.4.1 DBSCAN:Density-Based Clustering Based on Connected Regions with High Density471
10.4.2 OPTICS:Ordering Points to Identify the Clustering Structure473
10.4.3 DENCLUE:Clustering Based on Density Distribution Functions476
10.5 Grid-Based Methods479
10.5.1 STING:STatistical INformation Grid479
10.5.2 CLIQUE:An Apriori-like Subspace Clustering Method481
10.6 Evaluation of Clustering483
10.6.1 Assessing Clustering Tendency484
10.6.2 Determining the Number of Clusters486
10.6.3 Measuring Clustering Quality487
10.7 Summary490
10.8 Exercises491
10.9 Bibliographic Notes494
Chapter 11 Advanced Cluster Analysis497
11.1 Probabilistic Model-Based Clustering497
11.1.1 Fuzzy Clusters499
11.1.2 Probabilistic Model-Based Clusters501
11.1.3 Expectation-Maximization Algorithm505
11.2 Clustering High-Dimensional Data508
11.2.1 Clustering High-Dimensional Data:Problems,Challenges,and Major Methodologies508
11.2.2 Subspace Clustering Methods510
11.2.3 Biclustering512
11.2.4 Dimensionality Reduction Methods and Spectral Clustering519
11.3 Clustering Graph and Network Data522
11.3.1 Applications and Challenges523
11.3.2 Similarity Measures525
11.3.3 Graph Clustering Methods528
11.4 Clustering with Constraints532
11.4.1 Categorization of Constraints533
11.4.2 Methods for Clustering with Constraints535
11.5 Summary538
11.6 Exercises539
11.7 Bibliographic Notes540
Chapter 12 Outlier Detection543
12.1 Outliers and Outlier Analysis544
12.1.1 What Are Outliers?544
12.1.2 Types of Outliers545
12.1.3 Challenges of Outlier Detection548
12.2 Outlier Detection Methods549
12.2.1 Supervised,Semi-Supervised,and Unsupervised Methods549
12.2.2 Statistical Methods,Proximity-Based Methods,and Clustering-Based Methods551
12.3 Statistical Approaches553
12.3.1 Parametric Methods553
12.3.2 Nonparametric Methods558
12.4 Proximity-Based Approaches560
12.4.1 Distance-Based Outlier Detection and a Nested Loop Method561
12.4.2 A Grid-Based Method562
12.4.3 Density-Based Outlier Detection564
12.5 Clustering-Based Approaches567
12.6 Classification-Based Approaches571
12.7 Mining Contextual and Collective Outliers573
12.7.1 Transforming Contextual Outlier Detection to Conventional Outlier Detection573
12.7.2 Modeling Normal Behavior with Respect to Contexts574
12.7.3 Mining Collective Outliers575
12.8 Outlier Detection in High-Dimensional Data576
12.8.1 Extending Conventional Outlier Detection577
12.8.2 Finding Outliers in Subspaces578
12.8.3 Modeling High-Dimensional Outliers579
12.9 Summary581
12.10 Exercises582
12.11 Bibliographic Notes583
Chapter 13 Data Mining Trends and Research Frontiers585
13.1 Mining Complex Data Types585
13.1.1 Mining Sequence Data:Time-Series,Symbolic Sequences,and Biological Sequences586
13.1.2 Mining Graphs and Networks591
13.1.3 Mining Other Kinds of Data595
13.2 Other Methodologies of Data Mining598
13.2.1 Statistical Data Mining598
13.2.2 Views on Data Mining Foundations600
13.2.3 Visual and Audio Data Mining602
13.3 Data Mining Applications607
13.3.1 Data Mining for Financial Data Analysis607
13.3.2 Data Mining for Retail and Telecommunication Industries609
13.3.3 Data Mining in Science and Engineering611
13.3.4 Data Mining for Intrusion Detection and Prevention614
13.3.5 Data Mining and Recommender Systems615
13.4 Data Miningand Society618
13.4.1 Ubiquitous and Invisible Data Mining618
13.4.2 Privacy,Security,and Social Impacts of Data Mining620
13.5 Data Mining Trends622
13.6 Summary625
13.7 Exercises626
13.8 Bibliographic Notes628
Bibliography633
Index673
热门推荐
- 1075959.html
- 2441834.html
- 1435439.html
- 3287576.html
- 110355.html
- 709477.html
- 2274444.html
- 1226315.html
- 1684715.html
- 1422943.html
- http://www.ickdjs.cc/book_803064.html
- http://www.ickdjs.cc/book_2861726.html
- http://www.ickdjs.cc/book_1141686.html
- http://www.ickdjs.cc/book_3825969.html
- http://www.ickdjs.cc/book_87342.html
- http://www.ickdjs.cc/book_3804107.html
- http://www.ickdjs.cc/book_51809.html
- http://www.ickdjs.cc/book_2750499.html
- http://www.ickdjs.cc/book_2881165.html
- http://www.ickdjs.cc/book_3703917.html