A Comparison of Machine Learning Techniques for Taxonomic Classification of Teeth from the Family Bovidae


This study explores the performance of modern, accurate machine learning algorithms on the classification of fossil teeth in the Family Bovidae. Isolated bovid teeth are typically the most common fossils found in southern Africa and they often constitute the basis for paleoenvironmental reconstructions. Taxonomic identification of fossil bovid teeth, however, is often imprecise and subjective. Using modern teeth with known taxons, machine learning algorithms can be trained to classify fossils. Previous work by Brophy et. al. 2014 uses elliptical Fourier analysis of the form (size and shape) of the outline of the occlusal surface of each tooth as features in a linear discriminant analysis framework. This manuscript expands on that previous work by exploring how different machine learning approaches classify the teeth and testing which technique is best for classification. Five different machine learning techniques including linear discriminant analysis, neural networks, nuclear penalized multinomial regression, random forests, and support vector machines were used to estimate these models. Support vector machines and random forests perform the best in terms of both log-loss and misclassification rate; both of these methods are improvements over linear discriminant analysis. With the identification and application of these superior methods, bovid teeth can be classified with higher accuracy.

Journal of Applied Statistics