一本正经的贝叶斯理论

贝叶斯统计学 Bayesian statistics

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledeg about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation that views probability as the limit of the relative frequency of an event after many trials.

贝叶斯统计学是一套基于贝叶斯概率解释的统计学理论，它将概率表达为一个事件的置信度。置信度是基于事件的先验信息（如先前实验的结果）或个人对事件的信任度的。这一点不同于其它的概率解释理论，如频率学解释认为概率是在多次试验后事件发生的相对频率的极限。

先验概率 Prior probability

The prior probability of a random event or an uncertain proposition is the unconditional probability that is assigned before any relevant evidence is taken into account.

一个随机事件或不确定命题的先验概率是在未考虑任何相关观测情况下的无条件概率（绝对概率）。

后验概率 Posterior probability

The posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence or background is taken into account.

一个随机事件或不确定命题的后验概率是在考虑了相关观测或背景的情况下的条件概率。

“Posterior”, in this context, means after taking into account the relevant evidence related to the particular case being examined. For instance, there is a “non-posterior” probability of a person finding buried treasure if they dig in a random spot, and a posterior probability of finding buried treasure if they dig in a spot where their metal detector rings.

这里的“后验”是指在考虑了与被测试事件有关的相关观测之后。例如：如果一个人随机地取点开挖寻找地下宝藏，这是一个“非后验”概率；如果一个人在先用金属探测器圈起来的点上开挖寻宝，这就是一个后验概率。

贝叶斯理论 Bayes’ theorem

Bayes’ theorem is a fundamental theorem in Bayesian statistics, as it is used by Bayesian methods to update probabilities, which are degrees of belief, after obtaining new data. Given two events $A$ and $B$, the conditional probability of $A$ given that $B$ is true is expressed as follows:
$$ P(A \vert B) = \frac{ P(B \vert A) P(A)}{P(B)} $$
where $P(B) \neq 0$. Although Bayes’ theorem is a fundamental result of probability theory, it has a specific interpretation in Bayesian statistics. In the above equation, $A$ usually represents a proposition (such as the statement that a coin lands on heads fifty percent of the time) and $B$ represents the evidence, or new data that is to be taken into account (such as the result of a series of coin flips). $P(A)$ is the prior probability of $A$ which expresses one’s beliefs about $A$ before evidence is taken into account. The prior probability may also quantify prior knowledge or information about $A$. $P(B \vert A)$ is the likelihood function, which can be interpreted as the probability of the evidence $B$ given that $A$ is true. The likelihood quantifies the extent to which the evidence $B$ supports the proposition $A$. $P(A \vert B)$ is the posterior probability, the probability of the proposition $A$ after taking the evidence $B$ into account. Essentially, Bayes’ theorem updates one’s prior beliefs $P(A)$ after considering the new evidence $B$.

贝叶斯理论是贝叶斯统计学中的一个基础理论，用于贝叶斯方法中在获得新数据后更新概率（置信度）。给定两个事件 $A$ 和 $B$，在 $B$ 为真是情况下 $A$ 的条件概率可以表达为：

$$ P(A \vert B) = \frac{P(B \vert A) P(A)}{P(B)} $$

其中 $P(B) \neq 0$。尽管贝叶斯理论是概率理论的一个基本结果，但它在贝叶斯统计学中有特别的解释。在上式中，$A$ 常常表示一个命题（如一枚硬币有 $50%$ 的机会人头朝上落地），$B$ 表示将被考虑在内的观测或新数据（如一系列掷硬币的结果）。$P(A)$ 是用来表征在考虑观测之前 $A$ 的可信度的 $A$ 的先验概率。先验概率也可以由 $A$ 的先验信息量化得到。$P(B \vert A)$ 是用来表征在考虑了观测 $B$ 之后的命题 $A$ 的概率的似然函数。本质上，贝叶斯理论就是在考虑了新的观测 $B$ 之后更新先验可信度 $P(A)$。

The probability of the evidence $P(B)$ can be calculated using the law of total probability. If ${A_1, A_2, \ldots, A_n }$ is a partition of the sample space, which is the set of all outcomes of an experiment, then,
$$ P(B) = P(B \vert A_1) P(A_1) + P(B \vert A_2) P(A_2) + \cdots + P(B \vert A_n) P(A_n) = \sum_i P(B \vert A_i) P(A_i) $$
When there are an infinite number of outcomes, it is necessary to integrate over all outcomes to calculate $P(B)$ using the law of total probability. Often, $P(B)$ is difficult to calculate as the calculation would involve sums or integrals that would be time-consuming to evaluate, so often only the product of the prior and likelihood is considered, since the evidence does not change in the same analysis. The posterior is proportional to this product:
$$ P(A \vert B) \propto P(B \vert A) P(A) $$

观测的概率 $P(B)$ 可以利用全概率公式计算得到。设 ${A_1, A_2, \ldots, A_n }$ 是采样空间的各个分区，它是试验的所有输出的集合，则

$$ P(B) = P(B \vert A_1) P(A_1) + P(B \vert A_2) P(A_2) + \cdots + P(B \vert A_n) P(A_n) = \sum_i P(B \vert A_i) P(A_i) $$

如果有无限个输出，则需要根据全概率公式对所有输出做积分来求取 $P(B)$。由于计算涉及耗时的求和或积分，$P(B)$ 总是不易求得，所以常常只考虑先验概率和似然函数的积，因为观测在分析中是不变的。后验概率正比于二者之积：

$$ P(A \vert B) \propto P(B \vert A) P(A) $$