import sys
洪洞網(wǎng)站制作公司哪家好,找創(chuàng)新互聯(lián)建站!從網(wǎng)頁(yè)設(shè)計(jì)、網(wǎng)站建設(shè)、微信開發(fā)、APP開發(fā)、成都響應(yīng)式網(wǎng)站建設(shè)等網(wǎng)站項(xiàng)目制作,到程序開發(fā),運(yùn)營(yíng)維護(hù)。創(chuàng)新互聯(lián)建站自2013年起到現(xiàn)在10年的時(shí)間,我們擁有了豐富的建站經(jīng)驗(yàn)和運(yùn)維經(jīng)驗(yàn),來保證我們的工作的順利進(jìn)行。專注于網(wǎng)站建設(shè)就選創(chuàng)新互聯(lián)建站。
#Training data set
#each element in x represents (x0,x1,x2)
x = [(1,0.,3) , (1,1.,3) ,(1,2.,3), (1,3.,2) , (1,4.,4)]
#y[i] is the output of y = theta0 * x[0] + theta1 * x[1] +theta2 * x[2]
y = [95.364,97.217205,75.195834,60.105519,49.342380]
epsilon = 0.0001
#learning rate
alpha = 0.01
diff = [0,0]
max_itor = 1000
error1 = 0
error0 =0
cnt = 0
m = len(x)
#init the parameters to zero
theta0 = 0
theta1 = 0
theta2 = 0
while True:
cnt = cnt + 1
#calculate the parameters
for i in range(m):
diff[0] = y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] )
theta0 = theta0 + alpha * diff[0] * x[i][0]
theta1 = theta1 + alpha * diff[0]* x[i][1]
theta2 = theta2 + alpha * diff[0]* x[i][2]
#calculate the cost function
error1 = 0
for lp in range(len(x)):
error1 += ( y[i]-( theta0 + theta1 * x[i][1] + theta2 * x[i][2] ) )**2/2
if abs(error1-error0) epsilon:
break
else:
error0 = error1
print ' theta0 : %f, theta1 : %f, theta2 : %f, error1 : %f'%(theta0,theta1,theta2,error1)
print 'Done: theta0 : %f, theta1 : %f, theta2 : %f'%(theta0,theta1,theta2)
簡(jiǎn)介
本例子是通過對(duì)一組邏輯回歸映射進(jìn)行輸出,使得網(wǎng)絡(luò)的權(quán)重和偏置達(dá)到最理想狀態(tài),最后再進(jìn)行預(yù)測(cè)。其中,使用GD算法對(duì)參數(shù)進(jìn)行更新,損耗函數(shù)采取交叉商來表示,一共訓(xùn)練10000次。
2.python代碼
#!/usr/bin/python
import numpy
import theano
import theano.tensor as T
rng=numpy.random
N=400
feats=784
# D[0]:generate rand numbers of size N,element between (0,1)
# D[1]:generate rand int number of size N,0 or 1
D=(rng.randn(N,feats),rng.randint(size=N,low=0,high=2))
training_steps=10000
# declare symbolic variables
x=T.matrix('x')
y=T.vector('y')
w=theano.shared(rng.randn(feats),name='w') # w is shared for every input
b=theano.shared(0.,name='b') # b is shared too.
print('Initial model:')
print(w.get_value())
print(b.get_value())
# construct theano expressions,symbolic
p_1=1/(1+T.exp(-T.dot(x,w)-b)) # sigmoid function,probability of target being 1
prediction=p_10.5
xent=-y*T.log(p_1)-(1-y)*T.log(1-p_1) # cross entropy
cost=xent.mean()+0.01*(w**2).sum() # cost function to update parameters
gw,gb=T.grad(cost,[w,b]) # stochastic gradient descending algorithm
#compile
train=theano.function(inputs=[x,y],outputs=[prediction,xent],updates=((w,w-0.1*gw),(b,b-0.1*gb)))
predict=theano.function(inputs=[x],outputs=prediction)
# train
for i in range(training_steps):
pred,err=train(D[0],D[1])
print('Final model:')
print(w.get_value())
print(b.get_value())
print('target values for D:')
print(D[1])
print('prediction on D:')
print(predict(D[0]))
print('newly generated data for test:')
test_input=rng.randn(30,feats)
print('result:')
print(predict(test_input))
3.程序解讀
如上面所示,首先導(dǎo)入所需的庫(kù),theano是一個(gè)用于科學(xué)計(jì)算的庫(kù)。然后這里我們隨機(jī)產(chǎn)生一個(gè)輸入矩陣,大小為400*784的隨機(jī)數(shù),隨機(jī)產(chǎn)生一個(gè)輸出向量大小為400,輸出向量為二值的。因此,稱為邏輯回歸。
然后初始化權(quán)重和偏置,它們均為共享變量(shared),其中權(quán)重初始化為較小的數(shù),偏置初始化為0,并且打印它們。
這里我們只構(gòu)建一層網(wǎng)絡(luò)結(jié)構(gòu),使用的激活函數(shù)為logistic sigmoid function,對(duì)輸入量乘以權(quán)重并考慮偏置以后就可以算出輸入的激活值,該值在(0,1)之間,以0.5為界限進(jìn)行二值化,然后算出交叉商和損耗函數(shù),其中交叉商是代表了我們的激活值與實(shí)際理論值的偏離程度。接著我們使用cost分別對(duì)w,b進(jìn)行求解偏導(dǎo),以上均為符號(hào)表達(dá)式運(yùn)算。
接著我們使用theano.function進(jìn)行編譯優(yōu)化,提高計(jì)算效率。得到train函數(shù)和predict函數(shù),分別進(jìn)行訓(xùn)練和預(yù)測(cè)。
接著,我們對(duì)數(shù)據(jù)進(jìn)行10000次的訓(xùn)練,每次訓(xùn)練都會(huì)按照GD算法進(jìn)行更新參數(shù),最后我們得到了想要的模型,產(chǎn)生一組新的輸入,即可進(jìn)行預(yù)測(cè)。
實(shí)際上完成邏輯回歸是相當(dāng)簡(jiǎn)單的,首先指定要預(yù)測(cè)變量的列,接著指定模型用于做預(yù)測(cè)的列,剩下的就由算法包去完成了。
本例中要預(yù)測(cè)的是admin列,使用到gre、gpa和虛擬變量prestige_2、prestige_3、prestige_4。prestige_1作為基準(zhǔn),所以排除掉,以防止多元共線性(multicollinearity)和引入分類變量的所有虛擬變量值所導(dǎo)致的陷阱(dummy variable trap)。
程序縮進(jìn)如圖所示
以下為python代碼,由于訓(xùn)練數(shù)據(jù)比較少,這邊使用了批處理梯度下降法,沒有使用增量梯度下降法。
##author:lijiayan##data:2016/10/27
##name:logReg.pyfrom numpy import *import matplotlib.pyplot as pltdef loadData(filename):
data = loadtxt(filename)
m,n = data.shape ? ?print 'the number of ?examples:',m ? ?print 'the number of features:',n-1 ? ?x = data[:,0:n-1]
y = data[:,n-1:n] ? ?return x,y#the sigmoid functiondef sigmoid(z): ? ?return 1.0 / (1 + exp(-z))#the cost functiondef costfunction(y,h):
y = array(y)
h = array(h)
J = sum(y*log(h))+sum((1-y)*log(1-h)) ? ?return J# the batch gradient descent algrithmdef gradescent(x,y):
m,n = shape(x) ? ? #m: number of training example; n: number of features ? ?x = c_[ones(m),x] ? ? #add x0 ? ?x = mat(x) ? ? ?# to matrix ? ?y = mat(y)
a = 0.0000025 ? ? ? # learning rate ? ?maxcycle = 4000 ? ?theta = zeros((n+1,1)) ?#initial theta ? ?J = [] ? ?for i in range(maxcycle):
h = sigmoid(x*theta)
theta = theta + a * (x.T)*(y-h)
cost = costfunction(y,h)
J.append(cost)
plt.plot(J)
plt.show() ? ?return theta,cost#the stochastic gradient descent (m should be large,if you want the result is good)def stocGraddescent(x,y):
m,n = shape(x) ? ? #m: number of training example; n: number of features ? ?x = c_[ones(m),x] ? ? #add x0 ? ?x = mat(x) ? ? ?# to matrix ? ?y = mat(y)
a = 0.01 ? ? ? # learning rate ? ?theta = ones((n+1,1)) ? ?#initial theta ? ?J = [] ? ?for i in range(m):
h = sigmoid(x[i]*theta)
theta = theta + a * x[i].transpose()*(y[i]-h)
cost = costfunction(y,h)
J.append(cost)
plt.plot(J)
plt.show() ? ?return theta,cost#plot the decision boundarydef plotbestfit(x,y,theta):
plt.plot(x[:,0:1][where(y==1)],x[:,1:2][where(y==1)],'ro')
plt.plot(x[:,0:1][where(y!=1)],x[:,1:2][where(y!=1)],'bx')
x1= arange(-4,4,0.1)
x2 =(-float(theta[0])-float(theta[1])*x1) /float(theta[2])
plt.plot(x1,x2)
plt.xlabel('x1')
plt.ylabel(('x2'))
plt.show()def classifyVector(inX,theta):
prob = sigmoid((inX*theta).sum(1)) ? ?return where(prob = 0.5, 1, 0)def accuracy(x, y, theta):
m = shape(y)[0]
x = c_[ones(m),x]
y_p = classifyVector(x,theta)
accuracy = sum(y_p==y)/float(m) ? ?return accuracy
調(diào)用上面代碼:
from logReg import *
x,y = loadData("horseColicTraining.txt")
theta,cost = gradescent(x,y)print 'J:',cost
ac_train = accuracy(x, y, theta)print 'accuracy of the training examples:', ac_train
x_test,y_test = loadData('horseColicTest.txt')
ac_test = accuracy(x_test, y_test, theta)print 'accuracy of the test examples:', ac_test
學(xué)習(xí)速率=0.0000025,迭代次數(shù)=4000時(shí)的結(jié)果:
似然函數(shù)走勢(shì)(J = sum(y*log(h))+sum((1-y)*log(1-h))),似然函數(shù)是求最大值,一般是要穩(wěn)定了才算最好。
下圖為計(jì)算結(jié)果,可以看到訓(xùn)練集的準(zhǔn)確率為73%,測(cè)試集的準(zhǔn)確率為78%。
這個(gè)時(shí)候,我去看了一下數(shù)據(jù)集,發(fā)現(xiàn)沒個(gè)特征的數(shù)量級(jí)不一致,于是我想到要進(jìn)行歸一化處理:
歸一化處理句修改列l(wèi)oadData(filename)函數(shù):
def loadData(filename):
data = loadtxt(filename)
m,n = data.shape ? ?print 'the number of ?examples:',m ? ?print 'the number of features:',n-1 ? ?x = data[:,0:n-1]
max = x.max(0)
min = x.min(0)
x = (x - min)/((max-min)*1.0) ? ? #scaling ? ?y = data[:,n-1:n] ? ?return x,y
在沒有歸一化的時(shí)候,我的學(xué)習(xí)速率取了0.0000025(加大就會(huì)震蕩,因?yàn)橛行┨卣鞯闹岛艽?,學(xué)習(xí)速率取的稍大,波動(dòng)就很大),由于學(xué)習(xí)速率小,迭代了4000次也沒有完全穩(wěn)定?,F(xiàn)在當(dāng)把特征歸一化后(所有特征的值都在0~1之間),這樣學(xué)習(xí)速率可以加大,迭代次數(shù)就可以大大減少,以下是學(xué)習(xí)速率=0.005,迭代次數(shù)=500的結(jié)果:
此時(shí)的訓(xùn)練集的準(zhǔn)確率為72%,測(cè)試集的準(zhǔn)確率為73%
從上面這個(gè)例子,我們可以看到對(duì)特征進(jìn)行歸一化操作的重要性。