Introduction


  • Machine learning algorithms recognize patterns from example data
  • Supervised learning involves predicting labels from features

Classifying T-cells


  • The ml4bio software supports interactively exploring different classifiers and hyperparameters on a dataset
  • The machine learning workflow is split into data preprocessing and selection, training and model selection, and evaluation stages
  • Splitting a dataset into training, validation, and testing sets is key to being able to properly evaluate a machine learning method

Evaluating a Model


  • The choice of evaluation metric depends on the relative proportions of different classes in the data, and what we want the model to succeed at.
  • Comparing performance on the validation set with the right metric is an effective way to select a classifier and hyperparameter settings.

Decision Trees, Random Forests, and Overfitting


  • Decision trees require less effort to visualize interpret than other models
  • Decision trees are prone to overfitting
  • Random forests solve many of the problems of decision trees but are more difficult to interpret

Logistic Regression, Artificial Neural Networks, and Linear Separability


  • Logistic regression is a linear classifier.
  • The output of logistic regression is probability of a certain class.
  • Artificial neural networks can be viewed as an extension of logistic regression
  • Artificial neural networks can have nonlinear decision boundaries

Understanding Machine Learning Literature


  • Research workflows for machine learning are often not straightforward
  • Published papers often omit details which can make it difficult to evaluate machine learning workflows
  • Machine learning is used in a large variety of ways in biology

Conclusion and next steps


  • You are now prepared to consider how machine learning may benefit your research.
  • There are many excellent introductory and intermediate resources to help you continue to learn about machine learning.