Hidden neurons use 1/(1+e^-x) to produce continuous values between
0.0 and 1.0, expressing degrees of confidence rather than binary
on/off.
The output layer applies softmax to produce a probability distribution.
Temperature controls randomness: low = deterministic, high = exploratory.
In backprop mode, the network measures its error with a loss function and adjusts
weights using gradient descent — the same mechanism behind LLMs.