*Once a network has been structured for a particular application, that network is ready to be trained.

*To start this process the initial weights are chosen randomly. Then, the training, or learning, begins.

*There are two approaches to training - supervised and unsupervised.

Supervisied training:

Supervised training involves a mechanism of providing the network with the desired output either by manually "grading" the network's performance or by providing the desired outputs with the inputs.

Unsupervisied training:

Unsupervised training is where the network has to make sense of the inputs without outside help.

*Overfitting occurs when a statistical model or machine learning algorithm captures the noise of the data.

*Overfitting is often a result of an excessively complicated model, and it can be prevented by fitting multiple models and using validation or cross-validation to compare their predictive accuracies on test data.

Underfitting:

*Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data.

*Underfitting occurs when the model or the algorithm does not fit the data well enough. Specifically, underfitting occurs if the model or algorithm shows low variance but high bias.

The amount of weights are updated during traning is learning rate.

The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value.

Gradient descent:

*Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function.

*To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point. If, instead, one takes steps proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.

Loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event.

An optimization problem seeks to minimize a loss function.