python基于ID3思想的決策樹-創(chuàng)新互聯(lián)

這是一個(gè)判斷海洋生物數(shù)據(jù)是否是魚類而構(gòu)建的基于ID3思想的決策樹，供大家參考，具體內(nèi)容如下

目前創(chuàng)新互聯(lián)建站已為千余家的企業(yè)提供了網(wǎng)站建設(shè)、域名、網(wǎng)頁空間、成都網(wǎng)站托管、企業(yè)網(wǎng)站設(shè)計(jì)、華陰網(wǎng)站維護(hù)等服務(wù)，公司將堅(jiān)持客戶導(dǎo)向、應(yīng)用為本的策略，正道將秉承"和諧、參與、激情"的文化，與客戶和合作伙伴齊心協(xié)力一起成長，共同發(fā)展。

# coding=utf-8
import operator
from math import log
import time


def createDataSet():
  dataSet = [[1, 1, 'yes'],
        [1, 1, 'yes'],
        [1, 0, 'no'],
        [0, 1, 'no'],
        [0, 1, 'no'],
        [0,0,'maybe']]
  labels = ['no surfaceing', 'flippers']
  return dataSet, labels


# 計(jì)算香農(nóng)熵
def calcShannonEnt(dataSet):
  numEntries = len(dataSet)
  labelCounts = {}
  for feaVec in dataSet:
    currentLabel = feaVec[-1]
    if currentLabel not in labelCounts:
      labelCounts[currentLabel] = 0
    labelCounts[currentLabel] += 1
  shannonEnt = 0.0
  for key in labelCounts:
    prob = float(labelCounts[key]) / numEntries
    shannonEnt -= prob * log(prob, 2)
  return shannonEnt


def splitDataSet(dataSet, axis, value):
  retDataSet = []
  for featVec in dataSet:
    if featVec[axis] == value:
      reducedFeatVec = featVec[:axis]
      reducedFeatVec.extend(featVec[axis + 1:])
      retDataSet.append(reducedFeatVec)
  return retDataSet


def chooseBestFeatureToSplit(dataSet):
  numFeatures = len(dataSet[0]) - 1 # 因?yàn)閿?shù)據(jù)集的最后一項(xiàng)是標(biāo)簽
  baseEntropy = calcShannonEnt(dataSet)
  bestInfoGain = 0.0
  bestFeature = -1
  for i in range(numFeatures):
    featList = [example[i] for example in dataSet]
    uniqueVals = set(featList)
    newEntropy = 0.0
    for value in uniqueVals:
      subDataSet = splitDataSet(dataSet, i, value)
      prob = len(subDataSet) / float(len(dataSet))
      newEntropy += prob * calcShannonEnt(subDataSet)
    infoGain = baseEntropy - newEntropy
    if infoGain > bestInfoGain:
      bestInfoGain = infoGain
      bestFeature = i
  return bestFeature


# 因?yàn)槲覀冞f歸構(gòu)建決策樹是根據(jù)屬性的消耗進(jìn)行計(jì)算的，所以可能會(huì)存在最后屬性用完了，但是分類
# 還是沒有算完，這時(shí)候就會(huì)采用多數(shù)表決的方式計(jì)算節(jié)點(diǎn)分類
def majorityCnt(classList):
  classCount = {}
  for vote in classList:
    if vote not in classCount.keys():
      classCount[vote] = 0
    classCount[vote] += 1
  return max(classCount)


def createTree(dataSet, labels):
  classList = [example[-1] for example in dataSet]
  if classList.count(classList[0]) == len(classList): # 類別相同則停止劃分
    return classList[0]
  if len(dataSet[0]) == 1: # 所有特征已經(jīng)用完
    return majorityCnt(classList)
  bestFeat = chooseBestFeatureToSplit(dataSet)
  bestFeatLabel = labels[bestFeat]
  myTree = {bestFeatLabel: {}}
  del (labels[bestFeat])
  featValues = [example[bestFeat] for example in dataSet]
  uniqueVals = set(featValues)
  for value in uniqueVals:
    subLabels = labels[:] # 為了不改變原始列表的內(nèi)容復(fù)制了一下
    myTree[bestFeatLabel][value] = createTree(splitDataSet(dataSet,
                                bestFeat, value), subLabels)
  return myTree


def main():
  data, label = createDataSet()
  t1 = time.clock()
  myTree = createTree(data, label)
  t2 = time.clock()
  print myTree
  print 'execute for ', t2 - t1


if __name__ == '__main__':
  main()

另外有需要云服務(wù)器可以了解下創(chuàng)新互聯(lián)scvps.cn，海內(nèi)外云服務(wù)器15元起步，三天無理由+7*72小時(shí)售后在線，公司持有idc許可證，提供“云服務(wù)器、裸金屬服務(wù)器、高防服務(wù)器、香港服務(wù)器、美國服務(wù)器、虛擬主機(jī)、免備案服務(wù)器”等云主機(jī)租用服務(wù)以及企業(yè)上云的綜合解決方案，具有“安全穩(wěn)定、簡單易用、服務(wù)可用性高、性價(jià)比高”等特點(diǎn)與優(yōu)勢，專為企業(yè)上云打造定制，能夠滿足用戶豐富、多元化的應(yīng)用場景需求。

標(biāo)題名稱：python基于ID3思想的決策樹-創(chuàng)新互聯(lián)
URL地址：http://weahome.cn/article/ddoppp.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

python基于ID3思想的決策樹-創(chuàng)新互聯(lián)

其他資訊

網(wǎng)站制作

企業(yè)服務(wù)

網(wǎng)站建設(shè)

服務(wù)器托管