R語言中的前向逐步回歸是怎樣的

這篇文章給大家介紹R語言中的前向逐步回歸是怎樣的，內(nèi)容非常詳細(xì)，感興趣的小伙伴們可以參考借鑒，希望對(duì)大家能有所幫助。

創(chuàng)新互聯(lián)公司是專業(yè)的嘉定網(wǎng)站建設(shè)公司，嘉定接單;提供成都網(wǎng)站制作、做網(wǎng)站、外貿(mào)營銷網(wǎng)站建設(shè),網(wǎng)頁設(shè)計(jì),網(wǎng)站設(shè)計(jì),建網(wǎng)站,PHP網(wǎng)站建設(shè)等專業(yè)做網(wǎng)站服務(wù);采用PHP框架,可快速的進(jìn)行嘉定網(wǎng)站開發(fā)網(wǎng)頁制作和功能擴(kuò)展;專業(yè)做搜索引擎喜愛的網(wǎng)站,專業(yè)的做網(wǎng)站團(tuán)隊(duì),希望更多企業(yè)前來合作!

“ 建模過程中，選擇合適的特征集合，可以幫助控制模型復(fù)雜度，防止過擬合等問題。為了選取最佳的特征集合，可以遍歷所有的列組合，找出效果最佳的集合，但這樣需要大量的計(jì)算。本文介紹的前向逐步回歸法是針對(duì)最小二乘法的修改。相對(duì)于要將所有組合情況遍歷一遍，前向逐步回歸可以大大節(jié)省計(jì)算量，選擇最優(yōu)的特征集合，從而解決過擬合問題。”

前向逐步回歸

前向逐步回歸的過程是：遍歷屬性的一列子集，選擇使模型效果最好的那一列屬性。接著尋找與其組合效果最好的第二列屬性，而不是遍歷所有的兩列子集。以此類推，每次遍歷時(shí)，子集都包含上一次遍歷得到的最優(yōu)子集。這樣，每次遍歷都會(huì)選擇一個(gè)新的屬性添加到特征集合中，直至特征集合中特征個(gè)數(shù)不能再增加。

實(shí)例代碼

1、數(shù)據(jù)導(dǎo)入并分組。導(dǎo)入數(shù)據(jù)，將數(shù)據(jù)集抽取70%作為訓(xùn)練集，剩下30%作為測試集。特征與標(biāo)簽分開存放。

target.url <- "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
data <- read.csv(target.url,header = T,sep=";")
#divide data into training and test sets
index <- which((1:nrow(data))%%3==0)
train <- data[-index,]
test <- data[index,]
#arrange date into list and label sets
trainlist <- train[,1:11]
testlist <- test[,1:11]
trainlabel <- train[,12]
testlabel <- test[,12]

2、前向逐步回歸構(gòu)建輸出特征集合。通過for循環(huán)，從屬性的一個(gè)子集開始進(jìn)行遍歷。第一次遍歷時(shí)，該子集為空。每一個(gè)屬性被加入子集后，通過線性回歸來擬合模型，并計(jì)算在測試集上的誤差，每次遍歷選擇得到誤差最小的一列加入輸出特征集合中。最終得到輸出特征集合的關(guān)聯(lián)索引和屬性名稱。

#build list of attributes one-at-a-time, starting with empty
attributeList<-as.numeric()
index<-1:ncol(trainlist)
indexSet<-as.numeric()
oosError<-as.numeric()
for(i in index){
#attributes not in list already
attTry<-setdiff(index,attributeList)
#try each attribute not in set to see which one gives least oos error
errorList<-as.numeric()
attTemp<-as.numeric()
for(ii in attTry){
attTemp<-append(attTemp,attributeList)
attTemp<-append(attTemp,ii)
xTrainTemp<-as.data.frame(trainlist[,attTemp])
xTestTemp<-as.data.frame(testlist[,attTemp])
names(xTrainTemp)<-names(trainlist[attTemp])
names(xTestTemp)<-names(testlist[attTemp])
lm.mod <- lm(trainlabel~.,data=xTrainTemp)
rmsError<-rmse(testlabel,predict(lm.mod,(xTestTemp)))
errorList<-append(errorList,rmsError)
attTemp<-as.numeric()
}
iBest<-which.min(errorList)
attributeList<-append(attributeList,attTry[iBest])
oosError<-append(oosError,errorList[iBest])
}
cat("Best attribute indices: ", attributeList, "\n","Best attribute names: \n",names(trainlist[attributeList]))

索引與名稱如下：

Best attribute indices: 11 2 10 7 6 9 1 8 4 3 5
Best attribute names:
alcohol volatile.acidity sulphates total.sulfur.dioxide free.sulfur.dioxide pH fixed.acidity density residual.sugar citric.acid chlorides

屬性名列表的順序也是屬性的重要性排序，了解屬性重要性，可以增加模型的解釋性。

3、模型效果評(píng)估。分別畫出RMSE與屬性個(gè)數(shù)之間的關(guān)系，前向逐步預(yù)測算法對(duì)數(shù)據(jù)預(yù)測對(duì)錯(cuò)誤直方圖，和真實(shí)標(biāo)簽與預(yù)測標(biāo)簽散點(diǎn)圖。

plot(oosError,type = "l",xlab = "Number of Attributes",ylab = "ERMS",main = "error versus number of attributes")
finaltrain<-trainlist[,attributeList[1:which.min(oosError)]]
finaltest<-testlist[,attributeList[1:which.min(oosError)]]
lm.finalmol<-lm(trainlabel~.,data = finaltrain)
finalpre<-predict(lm.finalmol,finaltest)
errorVector<-testlabel-finalpre
hist(errorVector)
plot(predict(lm.finalmol,finaltest),testlabel,xlab = "Predicted Taste Score",ylab = "Actual Taste Score")

R語言中的前向逐步回歸是怎樣的

從圖上可以看出，使用前9個(gè)屬性，誤差值一直在降低，加入第十個(gè)屬性后，誤差值開始增加。因此，我們選取輸出特征集合的前9項(xiàng)，作為最終的最優(yōu)特征集合。從散點(diǎn)圖上看，得分在5、6時(shí)，預(yù)測情況非常好，因?yàn)閰^(qū)域的顏色深度可以反映點(diǎn)的堆積程度，一般情況下，機(jī)器學(xué)習(xí)算法對(duì)邊緣數(shù)據(jù)預(yù)測效果不好。由于真正的標(biāo)簽是整數(shù)，所以散點(diǎn)圖呈水平狀分布。后兩張圖，均可通過分析圖像形態(tài)，指出模型性能提升途徑。

關(guān)于R語言中的前向逐步回歸是怎樣的就分享到這里了，希望以上內(nèi)容可以對(duì)大家有一定的幫助，可以學(xué)到更多知識(shí)。如果覺得文章不錯(cuò)，可以把它分享出去讓更多的人看到。

網(wǎng)站標(biāo)題：R語言中的前向逐步回歸是怎樣的
文章分享：http://weahome.cn/article/iechpj.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

R語言中的前向逐步回歸是怎樣的

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管