Further Explorations in Classification
This chapter examines several other algorithms for classification including kNN and naïve Bayes. We look at the power of adding more data.
- Evaluating classifiers: training sets and test data
- 10-fold cross validation
- Which is better: adding more data or improving the algorithm?
- the kNN algorithm
- Python implementation of kNN
Page 13: divide data into buckets: divide.py
Page 14: nearestNeighborClassifier.py from last chapter (please modify to implement 10-fold cross validation).
Page 15: one solution to implementing 10-fold cross validation: crossValidation.py
Page 36: one solution to implementing kNN: pimaKNN.py.
Page 13. Auto MPG Data Set. (Quinlin 1993)
- Version divided into buckets in the format the book uses: mpgData.zip
- Original Version from the Machine Learning Repository.