Introduction
- Machine learning algorithms recognize patterns from example data
- Supervised learning involves predicting labels from features
Classifying T-cells
- The ml4bio software supports interactively exploring different classifiers and hyperparameters on a dataset
- The machine learning workflow is split into data preprocessing and selection, training and model selection, and evaluation stages
- Splitting a dataset into training, validation, and testing sets is key to being able to properly evaluate a machine learning method
Evaluating a Model
- The choice of evaluation metric depends on the relative proportions of different classes in the data, and what we want the model to succeed at.
- Comparing performance on the validation set with the right metric is an effective way to select a classifier and hyperparameter settings.
Decision Trees, Random Forests, and Overfitting
- Decision trees require less effort to visualize interpret than other models
- Decision trees are prone to overfitting
- Random forests solve many of the problems of decision trees but are more difficult to interpret
Logistic Regression, Artificial Neural Networks, and Linear Separability
- Logistic regression is a linear classifier.
- The output of logistic regression is probability of a certain class.
- Artificial neural networks can be viewed as an extension of logistic regression
- Artificial neural networks can have nonlinear decision boundaries
Understanding Machine Learning Literature
- Research workflows for machine learning are often not straightforward
- Published papers often omit details which can make it difficult to evaluate machine learning workflows
- Machine learning is used in a large variety of ways in biology
Conclusion and next steps
- You are now prepared to consider how machine learning may benefit your research.
- There are many excellent introductory and intermediate resources to help you continue to learn about machine learning.