Neural networks are a function approximator. Before neural networks, we had other
statistical methods. Regression being the one which is most often taught in school.
These are algorithms which find a function which most closely matches the given set of
data. However, these methods often struggle as the dimensional space or complexity of
the inputs and outputs increase. Neural networks, in particular referring to the gradient
descent method, can more easily find a function approximation for the given
inputs and outputs. This approach allows us to put computational power to create neural
networks which can span a large complexity space such as translation of vision.

#### Feature Importance

While there can be many inputs to a neural network, not all the inputs are equally
important. In fact, some of the inputs can be completely unrelated to the outputs we
are looking for. Feature importance represents the how much each of the inputs
contribute to the output.

One way of determining feature importance is to modify the value of one of the inputs
while keeping all the other inputs constant and seeing how much that affects the
output. It is important for the other variables to have different fixed random values
as the inputs may only affect the output given certain combinations of the other
inputs. To make this work, the other thing we need to keep in mind is that the inputs
have to be normalized in some way, otherwise, inputs which are larger will have a
smaller corresponding effect on the output.

#### Pruning

#### Further Reading