贝叶斯公式: P ( A i ∣ B ) = P ( B ∣ A i ) P ( A i ) ∑ i = 1 n P ( B ∣ A i ) P ( A i ) P\left(A_{i} | B\right)=\frac{P\left(B | A_{i}\right) P\left(A_{i}\right)}{\sum_{i=1}^{n} P\left(B | A_{i}\right) P\left(A_{i}\right)} P(Ai∣B)=∑i=1nP(B∣Ai)P(Ai)P(B∣Ai)P(Ai)
P(A|B) 指在 B 发生的情况下 A 发生的可能性,即已知 B 发生后 A 的条件概率,也可以理解为先有 B 再有 A,由于源于 B 的取值而被称作 A 的后验概率P(A) 指 A 的先验概率或边缘概率(先验可以理解为事件 A 的发生不考虑任何 B 方面的因素)P(B) 指 B 的先验概率或边缘概率,也可以作为标准化常量P(B|A) 指已知 A 发生后 B 的条件概率,即先有 A 再有 B,由于源于 A 的取值而被称作 B 的后验概率Naive Bayes(参考wiki) Using Bayes’ theorem, the conditional probability can be decomposed as p ( C k ∣ x ) = p ( C k ) p ( x ∣ C k ) p ( x ) p\left(C_{k} | \mathbf{x}\right)=\frac{p\left(C_{k}\right) p\left(\mathbf{x} | C_{k}\right)}{p(\mathbf{x})} p(Ck∣x)=p(x)p(Ck)p(x∣Ck) 分母常数,分子是joint probability model p ( C k , x 1 , … , x n ) p\left(C_{k}, x_{1}, \ldots, x_{n}\right) p(Ck,x1,…,xn) p ( C k , x 1 , … , x n ) = p ( x 1 , … , x n , C k ) = p ( x 1 ∣ x 2 , … , x n , C k ) p ( x 2 , … , x n , C k ) = p ( x 1 ∣ x 2 , … , x n , C k ) p ( x 2 ∣ x 3 , … , x n , C k ) p ( x 3 , … , x n , C k ) = … = p ( x 1 ∣ x 2 , … , x n , C k ) p ( x 2 ∣ x 3 , … , x n , C k ) … p ( x n − 1 ∣ x n , C k ) p ( C n ∣ C k ) p ( C k ) \begin{aligned} p\left(C_{k}, x_{1}, \ldots, x_{n}\right) &=p\left(x_{1}, \ldots, x_{n}, C_{k}\right) \\ &=p\left(x_{1} | x_{2}, \ldots, x_{n}, C_{k}\right) p\left(x_{2}, \ldots, x_{n}, C_{k}\right) \\ &=p\left(x_{1} | x_{2}, \ldots, x_{n}, C_{k}\right) p\left(x_{2} | x_{3}, \ldots, x_{n}, C_{k}\right) p\left(x_{3}, \ldots, x_{n}, C_{k}\right) \\ &=\ldots \\ &=p\left(x_{1} | x_{2}, \ldots, x_{n}, C_{k}\right) p\left(x_{2} | x_{3}, \ldots, x_{n}, C_{k}\right) \ldots p\left(x_{n-1} | x_{n}, C_{k}\right) p\left(C_{n} | C_{k}\right) p\left(C_{k}\right) \end{aligned} p(Ck,x1,…,xn)=p(x1,…,xn,Ck)=p(x1∣x2,…,xn,Ck)p(x2,…,xn,Ck)=p(x1∣x2,…,xn,Ck)p(x2∣x3,…,xn,Ck)p(x3,…,xn,Ck)=…=p(x1∣x2,…,xn,Ck)p(x2∣x3,…,xn,Ck)…p(xn−1∣xn,Ck)p(Cn∣Ck)p(Ck) Naive conditional independence assume that all features in x are mutually independent, conditional on the category C k C_k Ck: p ( x i ∣ x i + 1 , … , x n , C k ) = p ( x i ∣ C k ) p\left(x_{i} | x_{i+1}, \ldots, x_{n}, C_{k}\right)=p\left(x_{i} | C_{k}\right) p(xi∣xi+1,…,xn,Ck)=p(xi∣Ck) Thus the joint model can be expressed as p ( C k ∣ x 1 , … , x n ) ∝ p ( C k , x 1 , … , x n ) = p ( C k ) p ( x 1 ∣ C k ) p ( x 2 ∣ C k ) p ( x 3 ∣ C k ) ⋯ = p ( C k ) ∏ i = 1 n p ( x i ∣ C k ) \begin{aligned} p\left(C_{k} | x_{1}, \ldots, x_{n}\right) & \propto p\left(C_{k}, x_{1}, \ldots, x_{n}\right) \\ &=p\left(C_{k}\right) p\left(x_{1} | C_{k}\right) p\left(x_{2} | C_{k}\right) p\left(x_{3} | C_{k}\right) \cdots \\ &=p\left(C_{k}\right) \prod_{i=1}^{n} p\left(x_{i} | C_{k}\right) \end{aligned} p(Ck∣x1,…,xn)∝p(Ck,x1,…,xn)=p(Ck)p(x1∣Ck)p(x2∣Ck)p(x3∣Ck)⋯=p(Ck)i=1∏np(xi∣Ck)
LR和linear regression之间的区别与区别: 逻辑回归和线性回归都是广义的线性回归 线性模型的优化目标函数是最小二乘,而逻辑回归则是似然函数 线性回归的输出是实域上连续值,LR的输出值被S型函数映射到[0,1],通过设置阀值转换成分类类别 liner regression期望拟合训练数据,通过feature的线性加权来预测结果; logistic regression是在训练一个最大似然分类器。