Neural networks are a function approximator. Before neural networks, we had other statistical methods. Regression being the one which is most often taught in school. These are algorithms which find a function which most closely matches the given set of data. However, these methods often struggle as the dimensional space or complexity of the inputs and outputs increase. Neural networks, in particular referring to the gradient descent method, can more easily find a function approximation for the given inputs and outputs. This approach allows us to put computational power to create neural networks which can span a large complexity space such as translation of vision.
While there can be many inputs to a neural network, not all the inputs are equally important. In fact, some of the inputs can be completely unrelated to the outputs we are looking for. Feature importance represents the how much each of the inputs contribute to the output.
One way of determining feature importance is to modify the value of one of the inputs while keeping all the other inputs constant and seeing how much that affects the output. It is important for the other variables to have different fixed random values as the inputs may only affect the output given certain combinations of the other inputs. To make this work, the other thing we need to keep in mind is that the inputs have to be normalized in some way, otherwise, inputs which are larger will have a smaller corresponding effect on the output.