Supervised Learning Models

lectures 2

Author

Harun Pirim

Published

August 10, 2024

Introduction

This note is a general introduction to DL concepts. I will be following the textbook by Simon J. D. Prince, the first reference. The first part of the textbook introduces deep learning models and discusses how to train them, measure their performance, and improve this performance. The next part considers architectures that are specialized to images, text, and graph data[1].Author notes that ironically ‘no-one really understands deep learning at the time of writing’.

AI is a broad term that encompasses logic, search, and probabilistic reasoning. ML is a subset of AI that uses data to learn models.

A deep neural network is a ML model that is called DL when it is fitted to data. They are both powerful and practical.

See Figure 1.1 that coarsely shows three areas of ML. DL is applied in all these areas.

Figure 1.1: Three areas of ML. Courtesy of reference 1

Supervised Learning Models

These models map input data to an output prediction. Figure 1.2 shows variaous supervised learning models including regression and classification models. Observe multivariate regression, multiclass classification models.

Inputs are usually represented as vectors. Tabular data has no internal structure, if the order of the inputs is changed, the model should not change. However, images, text, and graphs have internal structures. For example, the order of the pixels in an image is important.

The model represents a family of equations mapping the input to the output (i.e., a family of different cyan curves). The particular equation (curve) is chosen using training data (examples of input/output pairs)[1]. See Figure 1.3.

Figure 1.3: Model family. Courtesy of reference 1

When we talk about training or fitting a model, we mean that we search through the family of possible equations (possible cyan curves) relating input to output to find the one that describes the training data most accurately[1].See Figure 1.3.

Deep neural networks are equations that can represent an extremely broad family of relationships between input and output, and where it is particularly easy to search through this family to find the relationship that describes the training data[1].

Deep neural networks can process inputs that are very large, of variable length, and contain various kinds of internal structures. They can output

single real numbers (regression),
multiple numbers (multivariate regression), or
probabilities over two or more classes (binary and multiclass classification, respectively)[1]. See Figure 1.4.

Figure 1.4: Deep neural networks. Courtesy of reference 1

Unsupervised Learning Models

The book focueses on generative unpervised models. These models learn to generate new data that is similar to the training data. That is describing the distribution of the training data and then sampling from this distribution to generate new data. Some models learn a mechanism to generate new examples without explicitly modeling the distribution of the training data. See Figures 1.5 to 1.8.

Some models use latent variables to generate new data. W can describe each data example using a smaller number of underlying latent variables. Here, the role of deep learning is to describe the mapping between these latent variables and the data[1]. See Figures 1.10. and 1.11.

Reinforcement Learning Models

Reinforcement learning models learn to make a sequence of decisions. The model receives a reward for each decision it makes. The model learns to make decisions that maximize the total reward it receives.

In the chess example, the network would learn a mapping from the current state of the board to the choice of move (figure 1.13)[1].

Figure 1.13: Reinforcement learning models. Courtesy of reference 1

Ethics

Bias and fairness: Careless application of algorithmic decision-making using AI has the potential to entrench or aggravate existing biases. See Binns (2018) for further discussion[1].

Explainability: However, it remains unknown whether it is possible to build complex decision-making systems that are fully transparent to their users or even their creators[1].

Book Structure

We will cover the following chapters (most of them hopefully) in the book:

Chapters 2-9 that is on supervised learning models. Chapters 10-13 that cover variaous DL architectures.

We will get into unsupervised learning as time permits.

Other Books

I suggest the following books for further reading:

Deep Learning illustrated by Krohn et al. (2019) [2]
Neural Networks and Deep Learning by Michael Nielsen (2015) [3]

If you are interested in a popular book on the subject that you can read in a weekend, I suggest the book ‘Why Machines Learn: The Elegant Math Behind Modern AI’ by Anil Ananthaswamy (2024)[4].

The texbook reveals itsels as ‘constant work in progress’ and “Mathematics, you see, is not a spectator sport.” encouraging the reader to work through the exercises and examples.

References

1: Understanding Deep Learning 2: Krohn, J., Beyleveld, G., & Bassens, A. (2019). Deep Learning Illustrated: A Visual, Interactive Guide to Artificial Intelligence. Addison-Wesley Professional. 3:[Neural Networks and Deep Learning by Michael Nielsen (2015)] (http://neuralnetworksanddeeplearning.com/index.html) 4: Anil Ananthaswamy (2024). Why Machines Learn: The Elegant Math Behind Modern AI.