Machine Learning for Biologists: Key Points

Beta

Machine Learning for Biologists

Introduction

Machine learning algorithms recognize patterns from example data
Supervised learning involves predicting labels from features

Classifying T-cells

The ml4bio software supports interactively exploring different classifiers and hyperparameters on a dataset
The machine learning workflow is split into data preprocessing and selection, training and model selection, and evaluation stages
Splitting a dataset into training, validation, and testing sets is key to being able to properly evaluate a machine learning method

Evaluating a Model

The choice of evaluation metric depends on the relative proportions of different classes in the data, and what we want the model to succeed at.
Comparing performance on the validation set with the right metric is an effective way to select a classifier and hyperparameter settings.

Decision Trees, Random Forests, and Overfitting

Decision trees require less effort to visualize interpret than other models
Decision trees are prone to overfitting
Random forests solve many of the problems of decision trees but are more difficult to interpret

Logistic Regression, Artificial Neural Networks, and Linear Separability

Logistic regression is a linear classifier.
The output of logistic regression is probability of a certain class.
Artificial neural networks can be viewed as an extension of logistic regression
Artificial neural networks can have nonlinear decision boundaries

Understanding Machine Learning Literature

Research workflows for machine learning are often not straightforward
Published papers often omit details which can make it difficult to evaluate machine learning workflows
Machine learning is used in a large variety of ways in biology

Conclusion and next steps

You are now prepared to consider how machine learning may benefit your research.
There are many excellent introductory and intermediate resources to help you continue to learn about machine learning.