by





Free Shipping On Orders Over $35
Your order must be $35 or more to qualify for free economy shipping. Marketplace items, eBooks and apparel do not qualify towards the $35 purchase minimum.
We're Sorry
Sold Out
We're Sorry
Sold Out
We're Sorry
Not Available
Data Mining for Business Intelligence, Second Edition uses real data and actual cases to illustrate the applicability of data mining (DM) intelligence in the development of successful business models. Featuring complimentary access to XLMiner®, the Microsoft Office Excel® add-in, this book allows readers to follow along and implement algorithms at their own speed, with a minimal learning curve. In addition, students and practitioners of DM techniques are presented with hands-on, business-oriented applications. An abundant amount of exercises and examples, now doubled in number in the second edition, are provided to motivate learning and understanding.
This book helps readers understand the beneficial relationship that can be established between DM and smart business practices, and is an excellent learning tool for creating valuable strategies and making wiser business decisions. New topics include detailed coverage of visualization enhanced by Spotfire subroutines and time series forecasting, among a host of other subject matter.
The Second Edition now features:
-Three new chapters on time series forecasting, introducing popular business forecasting methods including moving average, exponential smoothing methods; regression-based models; and topics such as explanatory vs. predictive modeling, two-level models, and ensembles
-A revised chapter on data visualization that now features interactive visualization principles and added assignments that demonstrate interactive visualization in practice
-Separate chapters that each treat k-nearest neighbors and Naïve Bayes methods
-Summaries at the start of each chapter that supply an outline of key topics
NITIN R. PATEL, PhD, is Chairman and cofounder of Cytel, Inc., based in Cambridge, Massachusetts. A Fellow of the American Statistical Association, Dr. Patel has also served as a Visiting Professor at the Massachusetts Institute of Technology for over ten years.
PETER C. BRUCE is President and owner of statistics.com, the leading provider of online education in statistics.
Foreword | |
Preface | |
Acknowledgments | |
Preliminaries | |
Introduction | |
What Is Data Mining? | |
Where Is Data Mining Used? | |
The Origins of Data Mining | |
The Rapid Growth of Data Mining | |
Why Are There So Many Different Methods? | |
Terminology and Notation | |
Road Maps to This Book | |
Overview of the Data Mining Process | |
Introduction | |
Core Ideas in Data Mining | |
Supervised and Unsupervised Learning | |
The Steps in Data Mining | |
Preliminary Steps | |
Building a Model: Example with Linear Regression | |
Using Excel for Data Mining | |
Problems | |
Data Exploration and Dimension Reduction | |
Data Visualization | |
Uses of Data Visualization | |
Data Examples | |
Boston Housing Data | |
Ridership on Amtrak Trains | |
Basic Charts: bar charts, line graphs, and scatterplots | |
Distribution Plots | |
Heatmaps: visualizing correlations and missing values | |
MultiDimensional Visualization | |
Adding Variables: color, hue, size, shape, multiple panels, animation | |
Manipulations: rescaling,aggregation and hierarchies, zooming and panning, filtering | |
Reference: trend line and labels | |
Scaling up: large datasets | |
Multivariate plot: parallel coordinates plot | |
Interactive visualization | |
Specialized Visualizations | |
Visualizing networked data | |
Visualizing hierarchical data: treemaps | |
Visualizing geographical data: maps | |
Summary of major visualizations and operations, according to data mining goal | |
Prediction | |
Classification | |
Time series forecasting | |
Unsupervised learning | |
Problems | |
Dimension Reduction | |
Introduction | |
Practical Considerations | |
House Prices in Boston | |
Data Summaries | |
Correlation Analysis | |
Reducing the Number of Categories in Categorical Variables | |
Converting A Categorical Variable to A Numerical Variable | |
Principal Components Analysis | |
Breakfast Cereals | |
Principal Components | |
Normalizing the Data | |
Using Principal Components for Classification and Prediction | |
Dimension Reduction Using Regression Models | |
Dimension Reduction Using Classification and Regression Trees | |
Problems | |
Performance Evaluation | |
Evaluating Classification and Predictive Performance | |
Introduction | |
Judging Classification Performance | |
Benchmark: The Naive Rule | |
Class Separation | |
The Classification Matrix | |
Using the Validation Data | |
Accuracy Measures | |
Cutoff for Classification | |
Performance in Unequal Importance of Classes | |
Asymmetric Misclassification Costs | |
Oversampling and Asymmetric Costs | |
Classification Using a Triage Strategy | |
Evaluating Predictive Performance | |
Benchmark: The Average | |
Prediction Accuracy Measures | |
Problems | |
Prediction and Classification Methods | |
Multiple Linear Regression | |
Introduction | |
Explanatory vs. Predictive Modeling | |
Estimating the Regression Equation and Prediction | |
Example: Predicting the Price of Used Toyota Corolla Automobiles | |
Variable Selection in Linear Regression | |
Reducing the Number of Predictors | |
How to Reduce the Number of Predictors | |
Problems | |
kNearest | |
Neighbors (kNN) | |
The kNN | |
Classifier | |
Determining Neighbors | |
Classification Rule | |
Example: Riding Mowers | |
Choosing k | |
Setting the Cutoff Value | |
kNN | |
With More Than 2 Classes | |
kNN | |
for a Numerical Response | |
Advantages and Shortcomings of kNN | |
Algorithms | |
Problems | |
Naive Bayes | |
Introduction | |
Predicting Fraudulent Financial Reporting | |
The Practical Difficulty with the Complete (Exact) Bayes Procedure | |
The Solution: Na‹ve Bayes | |
Predicting Fraudulent Financial Reports, 2 Predictors | |
Predicting Delayed Flights | |
Advantages and Shortcomings of the naive Bayes Classifier | |
Problems | |
Classification and Regression Trees | |
Introduction | |
Classification Trees | |
Recursive Partitioning | |
Riding Mowers | |
Measures of Impurity | |
Evaluating the Performance of a Classification Tree | |
Acceptance of Personal Loan | |
Avoiding Overfitting | |
Stopping Tree Growth: CHAID | |
Pruning the Tree | |
Classification Rules from Trees | |
Classification Trees for More Than 2 Classes | |
Regression Trees | |
Prediction | |
Measuring Impurity | |
Evaluating Performance | |
Advantages, Weaknesses, and Extensions | |
Problems | |
Logistic Regression | |
Introduction | |
The Logistic Regression Model | |
Example: Acceptance of Personal Loan | |
Model with a Single Predictor | |
Estimating the Logistic Model from Data: Computing Parameter | |
Estimates | |
Interpreting Results in Terms of Odds | |
Evaluating Classification Performance | |
Variable Selection | |
Example of Complete Analysis: Predicting Delayed Flights | |
Data Preprocessing | |
Model Fitting and Estimation | |
Model Interpretation | |
Model Performance | |
Variable Selection | |
Appendix: Logistic Regression for Profiling | |
Appendix: Logistic regression for profiling | |
Appendix: B: Evaluating Goodness of Fit | |
Appendix B Evaluating Goodness of Fit | |
Appendix: C: Logistic Regression for More Than Two Classes | |
Appendix C Logistic Regression for More Than Two Classes | |
Problems | |
Neural Nets | |
Introduction | |
Concept and Structure of a Neural Network | |
Fitting a Network to Data | |
Tiny Dataset | |
Computing Output of Nodes | |
Preprocessing the Data | |
Training the Model | |
Classifying Accident Severity | |
Avoiding overfitting | |
Using the Output for Prediction and Classification | |
Required User Input | |
Exploring the Relationship Between Predictors and Response | |
Advantages and Weaknesses of Neural Networks | |
Problems | |
Discriminant Analysis | |
Introduction | |
Riding Mowers | |
Personal Loan Acceptance | |
Distance of an Observation from a Class | |
Fisher's Linear Classification Functions | |
Classification Performance of Discriminant Analysis | |
Prior Probabilities | |
Unequal Misclassification Costs | |
Classifying More Than Two Classes | |
Medical Dispatch to Accident Scenes | |
Advantages and Weaknesses | |
Problems | |
Mining Relationships Among Records | |
Association Rules | |
Introduction | |
Discovering Association Rules in Transaction Databases | |
Synthetic Data on Purchases of Phone Faceplates | |
Generating Candidate Rules | |
The Apriori Algorithm | |
Selecting Strong Rules | |
Support and Confidence | |
Lift Ratio | |
Data Format | |
The Process of Rule Selection | |
Interpreting the Results | |
Statistical Significance of Rules | |
Rules for Similar Book Purchases | |
Summary | |
Problems | |
Cluster Analysis | |
Introduction | |
Example: Public Utilities | |
Measuring Distance Between Two Records | |
Euclidean Distance | |
Normalizing Numerical Measurements | |
Other Distance Measures for Numerical Data | |
Distance Measures for Categorical Data | |
Distance Measures for Mixed Data | |
Measuring Distance Between Two Clusters | |
Hierarchical (Agglomerative) Clustering | |
Contents | |
Minimum Distance (Single Linkage) | |
Maximum Distance (Complete Linkage) | |
Average Distance (Average Linkage) | |
Dendrograms: Displaying Clustering Process and Results | |
Validating Clusters | |
Limitations of Hierarchical Clustering | |
Nonhierarchical Clustering: The kMeans Algorithm | |
Initial Partition into k Clusters | |
Problems | |
Forecasting Time Series | |
Handling Time Series | |
Introduction | |
Explanatory vs. Predictive Modeling | |
Popular Forecasting Methods in Business | |
Combining Methods | |
Time Series Components | |
Example: Ridership on Amtrak Trains | |
Data Partitioning | |
Problems | |
Regression Based Forecasting | |
A Model with Trend | |
Linear Trend | |
Exponential Trend | |
Polynomial Trend | |
A Model with Seasonality | |
A model with trend and seasonality | |
Autocorrelation and ARIMA Models | |
Computing Autocorrelation | |
Computing Autocorrelation | |
Improving Forecasts by Integrating Autocorrelation Information | |
Improving Forecasts by Integrating Autocorrelation Information | |
Evaluating Predictability | |
Evaluating Predictability | |
Problems | |
Smoothing Methods | |
Introduction | |
Moving Average | |
Centered Moving Average for Visualization | |
Trailing Moving Average for Forecasting | |
Choosing Window Width | |
Simple Exponential Smoothing | |
Choosing Smoothing Parameter | |
Relation Between Moving Average and Simple Exponential | |
Smoothing | |
Advanced Exponential Smoothing | |
Series with a trend | |
Series with a trend and seasonality | |
Series with seasonality | |
Problems | |
Cases | |
Cases | |
Charles Book Club | |
German Credit | |
Tayko Software Cataloger | |
Segmenting Consumers of Bath Soap | |
DirectMail Fundraising | |
Catalog CrossSelling | |
Predicting Bankruptcy | |
Time Series Case: Forecasting Public Transportation Demand | |
References | |
Index | |
Table of Contents provided by Publisher. All Rights Reserved. |
An electronic version of this book is available through VitalSource.
This book is viewable on PC, Mac, iPhone, iPad, iPod Touch, and most smartphones.
By purchasing, you will be able to view this book online, as well as download it, for the chosen number of days.
You are licensing a digital product for a set duration. Durations are set forth in the product description, with "Lifetime" typically meaning five (5) years of online access and permanent download to a supported device. All licenses are non-transferable.
More details can be found here.
A downloadable version of this book is available through the eCampus Reader or compatible Adobe readers.
Applications are available on iOS, Android, PC, Mac, and Windows Mobile platforms.
Please view the compatibility matrix prior to purchase.
Please wait while the item is added to your cart...