Loss Function
Now we want to solve a image classification problem, for example classifying an image to be cow or cat. The machine learning algorithm will score a unclassified image according to different classes, and decide which class does this image belong to based on the score. One of the keys of the classification algorithm is designing this loss function.
Map/compute image pixels to the confidence score of each class
Assume a training set:
\[(x_i,y_i)\]\(x_i\) is the image and \(y_i\) is the corresponding class
i∈1…N means the traning set constains N images
\(y_i\)∈1…K means there are K image categories
So a score function maps x to y:
\[f(x_i,W,b)=W\cdot x_i+b\]In the above function, each image \(x_i\) is flattend to a 1 dimention vector
If one image’s size is 32x32 pixels with 3 channels
\(x_i\) will be a 1 dimention vector with the length of D=32x32x3=3072 Parameter matrix W has the size of [KxD], it is often called weights b of size [Kx1] is often called bias vector In this way, W is evaluating \(x_i\)’s confidence score for K categories at the same time