Least Angle Regression

xiaoxiao2022-07-03 131

文章目录

引一些基本的假设 LARS算法算法与别的方法结合LARS与LASSO的关系LARS 与 Stagewise 代码

Efron B, Hastie T, Johnstone I M, et al. Least angle regression[J]. Annals of Statistics, 2004, 32(2): 407-499.

引

在回归分析中，我们常常需要选取部分特征，而不是全都要，所以有前向法，后退法之类的，再根据一些指标设置停止准则。作者提出了一种LARS的算法，能够在有限步迭代后获得很好的结果，而且这种算法能够和LASSO和stagewise结合，加速他们的算法。在我看来，更为重要的是，其背后的几何解释。

可惜的是，证明实在太多，这方面的只是现在也不想去回顾了，就只能孤陋地把一些简单的东西记一下了。

一些基本的假设

上表，是作者进行比较实验的数据集，注意上面的符号：令 $X$ 表示变量集，以 $x_{ij}$ 表示其中的第ij个元素，即第i个病人的第j个指标，用 $x_i$ 表示第i个协变量，就是矩阵 $X$ 的第i列，用 $y$ 来表示应变量，且假设(事实上进行预处理，标准化）：线性回归，其系数假设为 $\beta=(\beta_1, \beta_2, \ldots, \beta_m)'$ ,给出预测向量 $\mu$ ，则： $\mu = \sum \limits_{j=1}^m x_j \beta_j = X\beta, \quad [X_{n\times m} = (x_1, x_2, \ldots, x_m)]$ 均方误差为： $S(\beta) = \|y-\mu\|^2 = \sum \limits_{i=1}^n (y_i - \mu_i)^2$

用 $T(\beta)$ 表示 $\beta$ 的 $\ell_1$ 范数： $T(\beta) = \sum \limits_{j=1}^m |\beta_j|.$ 那么，Lasso通过下式求解：而Forward Stagewise，以 $\widehat{\mu}=0$ 开始，令：表示当前X与 $y-\widehat{\mu}$ 的关系，其中 $X^{'}$ 表示 $X$ 的转置。下一步，前进法选择当前关系中最大的部分：注意到: $\widehat{c}_j = x_j'(y - \widehat{\mu}_{old})-\epsilon \cdot sign(\widehat{c}_{\hat{j}})=\widehat{c}_{\hat{j}}-\epsilon \cdot sign(\widehat{c}_{\hat{j}})$ 所以，如果 $\epsilon=|\widehat{c}_{\hat{j}}|$ ，那么 $\widehat{c}_j=0$ ，这个时候，stagewise就成了普通的前进法了，这个方法是过拟合的（不懂啊，照本宣科，所以 $\epsilon$ 改怎么选，我也不知道啊）。

文章给了俩种方法的一个比较：

俩个的效果是差不多的，但是，需要注意的是，这俩种方法的迭代次数是不定的。

LARS算法

我们从2个特征开始讨论， $\bar{y}$ 是 $y$ 在 $x_1, x_2$ 子空间的投影，以 $\hat{\mu}_0=0$ 为起点，此时 $\bar{y}_2$ 与 $x_1$ 的角度更加小(以锐角为准），所以， $\hat{\mu}_1=\gamma x_1$ ，相当于我们选取了特征 $x_1$ , 也就是说 $\beta_1=\gamma$ 。现在的问题是， $\gamma$ 该怎么选呢，LARS是这么选的， $\gamma$ 使得 $\bar{y}_2-\hat{\mu}_1$ 与 $x_1, x_2$ 的角度相等，因为这个点俩个特征的重要性是一致的。接着 $\hat{\mu}_2=\hat{\mu}_1+\gamma_2 u_2$ , $\gamma_2u_2=\bar{y}_2-\hat{\mu}_1$ 。这么做，我们将 $y$ 在子空间中的投影 $\bar{y}_2$ 抵消了。

为了将这种思想推广到更多特征，需要介绍一些符号和公式。

注意，公式(2.6)中的 $G_{\mathcal{A}}^{-1}=\mathcal{G_A^{-1}}$ 。

假设 $\widehat{\mu}_{\mathcal{A}}$ 是当前的LARS的估计，那么：指示集 $\mathcal{A}$ 是由下式定义的：令：注意，这样子，就能令 $s_jx_j'(y-\widehat{\mu}_{\mathcal{A}})=|\widehat{c}_j|, j\in \mathcal{A}$ 用 $X_{\mathcal{A}}, A_{\mathcal{A}}, u_{\mathcal{A}}$ 依照上面定义，令：于是，下一步的更新为：现在的问题是 $\widehat{\gamma}$ 应该怎么选，我们先来考察 $c_j(\gamma)$ : 所以，对于 $\in \mathcal{A}$ , $c_j(\gamma)$ 的变换程度是一致的： $\gamma \ge 0, A_{\mathcal{A}}\ge0$ ,(后者是公式所得，前者是为了减少相关度所必须的)，所以，当 $\gamma$ 从0开始慢慢增大的时候， $|c_j(\gamma)|$ 也会慢慢变小，到一定程度，势必会有 $\in \mathcal{A}^c$ , 使得 $\widehat{C}-\gamma A_{\mathcal{A}} \le |c_j(\gamma)|$ ，换句话说，我们只要找到 $\widehat{C}-\gamma A_{\mathcal{A}} = |c_j(\gamma)|, j \in \mathcal{A}^c$ 的最小 $\gamma$ ,且 $\gamma > 0$ ，否则， $\gamma=0$ ： $\widehat{C}-\gamma A_{\mathcal{A}} = |c_j(\gamma)|, j \in \mathcal{A}^c \\ \Rightarrow \quad \widehat{C}-\gamma A_{\mathcal{A}}=\widehat{c}_j-\gamma a_j | -\widehat{c}_j+\gamma a_j \\ \Rightarrow \gamma = \frac{\widehat{C}-\widehat{c}_j}{A_{\mathcal{A}}-a_j} |\frac{\widehat{C}-\widehat{c}_j}{A_{\mathcal{A}}+a_j}$ 总结来说，就是: 其中 $+$ 表示，如果后半部分的结果均小于0，那么 $\widehat{\gamma}=0$ 。这么做，我们就将 $c_j(\gamma)$ 一直降，降到有一个和他们一样，假设这个 $\in \mathcal{A}^c$ 为 $j^*$ ,于是下一步更新的时候，我们需要将这个加入到 $\mathcal{A}$ , $\mathcal{A}= \mathcal{A} \cup \{j^*\}$ 。

假设在LARS的第k步：用 $X_k, \mathcal{G}_k, A_k$ 和 $\mu_k$ 是第k步的类似的定义，注意，我们省略了 $\mathcal{A}$ 。用 $\bar{y}_k$ 来表示 $y$ 在子空间 $\mathcal{L}(X_k)$ 的投影，既然 $\widehat{\mu}_{k-1}\in \mathcal{L}(X_{k-1})$ （因为起点为0），所以：第一个等式后的第二项是 $y-\widehat{\mu}_{k-1}$ 在子空间 $\mathcal{L}(X_k)$ 中的投影，第二个等式从(2.6)可以推得： $u_k = X_k A_k \mathcal{G}_k^{-1}1_k$ 又根据(2.18）便可得证。根据(2.19)可得：又 $\widehat{\gamma}_ku_k = \widehat{\mu}_k-\widehat{\mu}_{k-1}$ ：这说明，每一次更新时，变化的向量时沿着 $\bar(y)_k-\widehat{\mu}_{k-1}$ 的。我们通过一个图片来展示：

上图，我们要处理的时 $\bar{y}_3$ ，在以及处理 $\bar{y}_2$ 的基础上，我们看到，最后变化的量是在 $\bar{y}_3-\widehat{\mu}_2$ 方向上。

算法

顺便整理下算法吧，以便以后用，符号就用自己的了：

Input: 标准化后的 $[x_1, x_2, \ldots, x_m]$ 和 $y=[y_1, \ldots, y_n]^T$ , 特征数 $r$ ; 令： $\mu_{\mathcal{A}}=0, \beta=[0, 0, \ldots, 0] \in \mathbb{R}^m$ ; 计算 $c = X^Ty$ , 找出其中绝对值最大的元素，令其指标集为 $\mathcal{A}$ ,最大值为 $C$ ，令 $X_{\mathcal{A}}=[\ldots, s_j x_j, \ldots]_{j \in \mathcal{A}}$ , $\mathcal{G_A}=X_{\mathcal{A}}^TX_{\mathcal{A}}, A_{\mathcal{A}}=(1_\mathcal{A}^T\mathcal{G_A}^{-1}1_{\mathcal{A}})^{-1/2}$ , $\mathrm{u}_{\mathcal{A}}=X_{\mathcal{A}}w_{\mathcal{A}},w_{\mathcal{A}}=A_{\mathcal{A}}\mathcal{G_A}^{-1}1_{\mathcal{A}}$ For $\ldots, r$ : 1. 根据公式(2.13)计算 $\widehat{\gamma}$ ,记录相应的 $j$ ,如果 $\widehat{\gamma}=0$ ，停止迭代。 2. $\mu_\mathcal{A}=\mu_{\mathcal{A}}+\widehat{\gamma}\mathrm{u}_{\mathcal{A}}$ 3. $\beta = \beta+\widehat{\gamma}w_{\mathcal{A}}\otimes s_{\mathcal{A}}$ 4. 更新 $\mathcal{A}=\mathcal{A} \cup \{j\}$ , $C=C-\widehat{\gamma}A_{\mathcal{A}}$ , $c=X^T(y-\mu_\mathcal{A})$ 5. 更新 $X_{\mathcal{A}},\mathcal{G_A}, A_{\mathcal{A}}, \mathrm{u}_{\mathcal{A}}$ 输出： $\beta, \mu_{\mathcal{A}}$

注意，上面的 $w_\mathcal{A}\otimes s_\mathcal{A}$ 表示对于元素相乘, $s_{\mathcal{A}}$ 表示对应的符号。还有，如果 $r = m$ ，那么上面的迭代只能进行到 $r - 1$ 步，最后一步可以根据公式(2.19)的分解来，在代码中予以了实现。

不过，利用代码进行实验的时候，发现这俩个好像不大一样

我感觉没有错。

与别的方法结合

LARS与LASSO的关系

通过对 $\gamma$ 的调整，利用LARS也能求解LASSO，证明并没有去看。

可以证明，如果 $\widehat{\beta}$ 是通过LASSO求得的解，那么：

令 $d_j = s_jw_{\mathcal{A}j}, j\in \mathcal{A}$ ,那么对于任意的 $\in \mathcal{A}$ ：因此， $\beta_j({\gamma})$ 改变符号发生在：第一次改变符号发生在：如果，所有的 $\gamma_j$ 均小于0，那么 $\widetilde{\gamma}=+\infty$ 。也就是说，如果 $\widetilde{\gamma}< \widehat{\gamma}$ ，为了使得 $\beta$ 和 $c$ 符号保持一致，我们应当选择前者作为此次的更新步长，同时将 $j$ 从 $\mathcal{A}$ 中移除。

LARS 与 Stagewise

代码

import numpy as np import matplotlib.pyplot as plt class LARS_LASSO: def __init__(self, data, response): self.__data = data self.__response = response self.n, self.m = self.data.shape self.mu = np.zeros(self.n, dtype=float) self.beta = np.zeros(self.m, dtype=float) self.compute_c() self.compute_index() self.compute_basic() self.progress_beta = [] self.progress_mu = [] @property def data(self): return self.__data @property def response(self): return self.__response def compute_c(self): """计算关系度c""" self.c = self.data.T @ (self.response-self.mu) def compute_index(self): """找出最大值C和指标集A，以及sj""" self.index = [np.argmax(np.abs(self.c))] newc = self.c[self.index] self.maxC = np.abs(newc[0]) sign = lambda x: 1. if x >= 0 else -1. self.s = np.array( [sign(item) for item in newc], dtype=float ) def compute_basic(self): """计算一些基本的东西 index_A: A_A index_w: w_A index_u: u_A """ index_X = self.data[:, self.index] * self.s index_G = index_X.T @ index_X index_G_inv = np.linalg.inv(index_G) self.index_A = 1 / np.sqrt(np.sum(index_G_inv)) self.index_w = np.sum(index_G_inv, 1) * self.index_A self.index_u = index_X @ self.index_w def update_c(self): """更新c""" self.compute_c() def update_index(self, j): """更新指示集合 index: 指示集合A maxC: 最大的c s: 符号 """ if j in self.index: self.index.remove(j) else: self.index.append(j) self.index.sort() newc = self.c[self.index] self.maxC = np.abs(newc[0]) sign = lambda x: 1. if x >= 0 else -1. self.s = np.array( [sign(item) for item in newc], dtype=float ) def update_basic(self): """更新基本的东西""" self.compute_basic() def current_gamma(self): """找第一次改变符号的位置""" const = 999999999. d = self.s * self.index_w index_beta = self.beta[self.index] z = [] for i in range(len(d)): if -index_beta[i] * d[i] <= 0: z.append(const) else: z.append(-index_beta[i] / d[i]) z = np.array(z, dtype=float) label = np.argmin(z) themin = z[label] return themin, self.index[label] def step(self): """操作一步""" const = 9999999999. def divide(x, y): z = [] for i in range(len(x)): if x[i] * y[i] <= 0: z.append(const) else: z.append(x[i] / y[i]) return z complement_index = list(set(range(self.m)) - set(self.index)) a = self.data.T @ self.index_u complement_a = a[complement_index] complement_c = self.c[complement_index] index_reduce_a = self.index_A - complement_a index_plus_a = self.index_A + complement_a maxC_reduce_c = self.maxC - complement_c maxc_plus_c = self.maxC + complement_c min1 = divide(maxC_reduce_c, index_reduce_a) min2 = divide(maxc_plus_c, index_plus_a) totalmin = np.array( [min1, min2] ) allmin = np.min(totalmin, 0) min_beta, label2 = self.current_gamma() self.progress_beta.append(np.array(self.beta)) self.progress_mu.append(np.array(self.mu)) try: label = np.argmin(allmin) except ValueError: index_X = self.data[:, self.index] * self.s index_G = index_X.T @ index_X index_G_inv = np.linalg.inv(index_G) deltau = index_G_inv @ index_X.T @ (self.response - self.mu) self.mu = self.mu + index_X @ deltau self.beta = self.beta + deltau * self.s return 0 print(min_beta, allmin[label]) if min_beta < allmin[label]: gamma = min_beta j = label2 else: gamma = 0. if allmin[label] == const else allmin[label] j = complement_index[label] self.mu = self.mu + gamma * self.index_u self.beta[self.index] = self.beta[self.index] + (self.s * self.index_w) * gamma if self.life == 0: return 1 self.update_c() self.update_index(j) self.update_basic() return 1 def process(self, r=1): self.life = r for i in range(r): self.life -= 1 print("step:", i) self.step() self.progress_beta.append(np.array(self.beta)) self.progress_mu.append(np.array(self.mu)) index_X = self.data[:, self.index] * self.s index_G = index_X.T @ index_X index_G_inv = np.linalg.inv(index_G) deltau = index_G_inv @ index_X.T @ (self.response - self.mu) self.mu = self.mu + index_X @ deltau self.beta[self.index] = self.beta[self.index] + deltau * self.s self.progress_beta.append(np.array(self.beta)) self.progress_mu.append(np.array(self.mu)) def plot(self): """plot beta, error""" fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 5), constrained_layout=True) beta = np.array(self.progress_beta) mu = np.array(self.progress_mu) r, m = beta.shape error = np.sum((mu - self.response) ** 2, 1) x = np.arange(1, r+1) for i in range(m): y = beta[:, i] ax[0].plot(x, y, label="feature{0}".format(i)) ax[0].text(x[-1]+0.05, y[-1], str(i)) ax[0].set_title(r"$\beta$ with iterations") ax[0].set_xlabel(r"iterations") ax[0].set_ylabel(r"$\beta$") ax[0].legend(loc="best", ncol=2) ax[1].plot(x, error) ax[1].set_title("square error with iterations") ax[1].set_xlabel("iterations") ax[1].set_ylabel("square error") plt.show() data1 = np.loadtxt("C:\\Users\\pkavs\\Desktop\\diabetes.txt", dtype=float) mu = np.mean(data1, 0) std = np.std(data1, 0) data1 = (data1 - mu) / std data = data1[:, :10] response = data1[:, 10] test = LARS_LASSO(data, response) test.process(r=7) test.plot() print(test.progress_beta) """ 跟论文有出路，实验的时候并没有删除的过程，好像是要在全部特征的基础上，再进行一步，不过机制不想改了，就这样吧 """ import numpy as np import matplotlib.pyplot as plt class LARS_LASSO: def __init__(self, data, response): self.__data = data self.__response = response self.n, self.m = self.data.shape self.mu = np.zeros(self.n, dtype=float) self.beta = np.zeros(self.m, dtype=float) self.compute_c() self.compute_index() self.compute_basic() self.progress_beta = [] self.progress_mu = [] @property def data(self): return self.__data @property def response(self): return self.__response def compute_c(self): """计算关系度c""" self.c = self.data.T @ (self.response-self.mu) def compute_index(self): """找出最大值C和指标集A，以及sj""" self.index = [np.argmax(np.abs(self.c))] newc = self.c[self.index] self.maxC = np.abs(newc[0]) sign = lambda x: 1. if x >= 0 else -1. self.s = np.array( [sign(item) for item in newc], dtype=float ) def compute_basic(self): """计算一些基本的东西 index_A: A_A index_w: w_A index_u: u_A """ index_X = self.data[:, self.index] * self.s index_G = index_X.T @ index_X index_G_inv = np.linalg.inv(index_G) self.index_A = 1 / np.sqrt(np.sum(index_G_inv)) self.index_w = np.sum(index_G_inv, 1) * self.index_A self.index_u = index_X @ self.index_w def update_c(self): """更新c""" self.compute_c() def update_index(self, j): """更新指示集合 index: 指示集合A maxC: 最大的c s: 符号 """ if j in self.index: self.index.remove(j) else: self.index.append(j) self.index.sort() newc = self.c[self.index] self.maxC = np.abs(newc[0]) sign = lambda x: 1. if x >= 0 else -1. self.s = np.array( [sign(item) for item in newc], dtype=float ) def update_basic(self): """更新基本的东西""" self.compute_basic() def current_gamma(self): """找第一次改变符号的位置""" const = 999999999. d = self.s * self.index_w index_beta = self.beta[self.index] z = [] for i in range(len(d)): if -index_beta[i] * d[i] <= 0: z.append(const) else: z.append(-index_beta[i] / d[i]) z = np.array(z, dtype=float) label = np.argmin(z) themin = z[label] return themin, self.index[label] def step(self): """操作一步""" const = 9999999999. def divide(x, y): z = [] for i in range(len(x)): if x[i] * y[i] <= 0: z.append(const) else: z.append(x[i] / y[i]) return z complement_index = list(set(range(self.m)) - set(self.index)) a = self.data.T @ self.index_u complement_a = a[complement_index] complement_c = self.c[complement_index] index_reduce_a = self.index_A - complement_a index_plus_a = self.index_A + complement_a maxC_reduce_c = self.maxC - complement_c maxc_plus_c = self.maxC + complement_c min1 = divide(maxC_reduce_c, index_reduce_a) min2 = divide(maxc_plus_c, index_plus_a) totalmin = np.array( [min1, min2] ) allmin = np.min(totalmin, 0) min_beta, label2 = self.current_gamma() print(len(self.progress_beta)) self.progress_beta.append(np.array(self.beta)) self.progress_mu.append(np.array(self.mu)) try: label = np.argmin(allmin) except ValueError: index_X = self.data[:, self.index] * self.s index_G = index_X.T @ index_X index_G_inv = np.linalg.inv(index_G) deltau = index_G_inv @ index_X.T @ (self.response - self.mu) self.mu = self.mu + index_X @ deltau self.beta = self.beta + deltau * self.s return 0 if min_beta < allmin[label]: gamma = min_beta label = label2 else: gamma = 0. if allmin[label] == const else allmin[label] self.mu = self.mu + gamma * self.index_u self.beta[self.index] = self.beta[self.index] + (self.s * self.index_w) * gamma if self.life == 0: return 1 j = complement_index[label] self.update_c() self.update_index(j) self.update_basic() return 1 def process(self, r=1): self.life = r for i in range(r): self.life -= 1 print("step:", i) self.step() self.progress_beta.append(np.array(self.beta)) self.progress_mu.append(np.array(self.mu)) index_X = self.data[:, self.index] * self.s index_G = index_X.T @ index_X index_G_inv = np.linalg.inv(index_G) deltau = index_G_inv @ index_X.T @ (self.response - self.mu) self.mu = self.mu + index_X @ deltau self.beta[self.index] = self.beta[self.index] + deltau * self.s self.progress_beta.append(np.array(self.beta)) self.progress_mu.append(np.array(self.mu)) def plot(self): """plot beta, error""" fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(10, 5), constrained_layout=True) beta = np.array(self.progress_beta) mu = np.array(self.progress_mu) r, m = beta.shape error = np.sum((mu - self.response) ** 2, 1) x = np.arange(1, r+1) for i in range(m): y = beta[:, i] ax[0].plot(x, y, label="feature{0}".format(i)) ax[0].text(x[-1]+0.05, y[-1], str(i)) ax[0].set_title(r"$\beta$ with iterations") ax[0].set_xlabel(r"iterations") ax[0].set_ylabel(r"$\beta$") ax[0].legend(loc="best", ncol=2) ax[1].plot(x, error) ax[1].set_title("square error with iterations") ax[1].set_xlabel("iterations") ax[1].set_ylabel("square error") plt.show()

最新回复(0)