Why exponential term for weighing observations?
Note the nature of the observation weights in the training algorithm.
$$ w_\nlabeledsmall^{(m+1)} \leftarrow w_\nlabeledsmall^{(m)} \exp \left[ -\alpha_m y_\nlabeledsmall H_m(\vx_\nlabeledsmall) \right],~~~~ \forall \nlabeledsmall = 1,\ldots,\nlabeled $$
When this term was originally proposed, it's primary motivation was computational ease. Why?
Suppose the overall classifier after the first \( m \) weak classifiers have been trained is \( F_m \).
$$ F_m(\vx) = \sign\left( \sum_{t=1}^m \alpha_t H_t(\vx) \right) $$
The error of this classifier on an example \( \vx_\nlabeledsmall \) is
$$ \text{err}_{F_m}(\vx_\nlabeledsmall) = \indicator{y_\nlabeledsmall \ne F_m(\vx_\nlabeledsmall)} $$
Note that \( y_\nlabeledsmall \in \set{-1,1} \) and \( F_m(\vx_\nlabeledsmall) \in \set{-1,1} \).
Whenever \( y_\nlabeledsmall \ne F_m(\vx_\nlabeledsmall) \), it is the case that \( y_\nlabeledsmall F_m(\vx_\nlabeledsmall) \le 0 \).
Thus, the error of the overall classifier \( F_m \) on an example \( \vx_\nlabeledsmall \) after \( m \) weak classifiers have been trained can be written as
$$ \text{err}_{F_m}(\vx_\nlabeledsmall) = \indicator{y_\nlabeledsmall F_m(\vx_\nlabeledsmall) \le 0} $$
Substituting \( F_m \) in terms of the constituent weak classifiers, we get
$$ \text{err}_{F_m}(\vx_\nlabeledsmall) = \indicator{y_\nlabeledsmall \sum_{t=1}^m \alpha_t H_t(\vx_\nlabeledsmall) \le 0} $$
Where, we have ignored the \( \sign \) function as the outcome will be independent of that.
Now, it is the case that for any \( p \in \real \)
$$ \indicator{p \le 0} \le \text{ exp}\left(-p \right) $$
Thus, the error of classifier \( F_m \) is upper bounded as follows.
$$ \text{err}_{F_m}(\vx_\nlabeledsmall) \le \text{exp} \left[-y_\nlabeledsmall \sum_{t=1}^m \alpha_t H_t(\vx_\nlabeledsmall) \right] $$
Being an exponential term, this is equivalent to
$$ \text{err}_{F_m}(\vx_\nlabeledsmall) \le \text{exp} \left[-y_\nlabeledsmall \sum_{t=1}^{m-1} \alpha_t H_t(\vx_\nlabeledsmall) \right] \text{exp} \left[-y_\nlabeledsmall \alpha_m H_m(\vx_\nlabeledsmall) \right] $$
This means,
$$ \text{err}_{F_m}(\vx_\nlabeledsmall) \le \text{err}_{F_{m-1}}(\vx_\nlabeledsmall) \text{exp} \left[-\alpha_m y_\nlabeledsmall H_m(\vx_\nlabeledsmall) \right] $$
If the weight of an instance \( \vx_\nlabeledsmall \) is to be proportional to the error of the current classifier, as intuitively desired in boosting, it would then be the case that
$$ w_\nlabeledsmall^{(m+1)} \leftarrow w_\nlabeledsmall^{(m)} \text{ exp}\left[ - \alpha_m y_\nlabeledsmall H_m(\vx_\nlabeledsmall) \right] $$
That is how you arrive at the exponential term for the observation weights. In other words, the exponential term helps in a modularized recursive weight updates that depend on the weights from the previous iteration.
If not for the exponential term, some other score may require the calculation of the overall classifier error after each weak classifier has been trained, making it computationally expensive.