函數(shù)也是數(shù)據(jù)python的簡單介紹

數(shù)據(jù)分析員用python做數(shù)據(jù)分析是怎么回事，需要用到python中的那些內(nèi)容，具體是怎么操作的?

最近，Analysis with Programming加入了Planet Python。我這里來分享一下如何通過Python來開始數(shù)據(jù)分析。具體內(nèi)容如下：

10年積累的成都網(wǎng)站設計、成都網(wǎng)站制作經(jīng)驗，可以快速應對客戶對網(wǎng)站的新想法和需求。提供各種問題對應的解決方案。讓選擇我們的客戶得到更好、更有力的網(wǎng)絡服務。我雖然不認識你，你也不認識我。但先網(wǎng)站設計后付款的網(wǎng)站建設流程，更有麗水免費網(wǎng)站建設讓你可以放心的選擇與我們合作。

數(shù)據(jù)導入

導入本地的或者web端的CSV文件；

數(shù)據(jù)變換；

數(shù)據(jù)統(tǒng)計描述；

假設檢驗

單樣本t檢驗；

可視化；

創(chuàng)建自定義函數(shù)。

數(shù)據(jù)導入

這是很關鍵的一步，為了后續(xù)的分析我們首先需要導入數(shù)據(jù)。通常來說，數(shù)據(jù)是CSV格式，就算不是，至少也可以轉(zhuǎn)換成CSV格式。在Python中，我們的操作如下：

import pandas as pd

# Reading data locally

df = pd.read_csv('/Users/al-ahmadgaidasaad/Documents/d.csv')

# Reading data from web

data_url = ""

df = pd.read_csv(data_url)

為了讀取本地CSV文件，我們需要pandas這個數(shù)據(jù)分析庫中的相應模塊。其中的read_csv函數(shù)能夠讀取本地和web數(shù)據(jù)。

END

數(shù)據(jù)變換

既然在工作空間有了數(shù)據(jù)，接下來就是數(shù)據(jù)變換。統(tǒng)計學家和科學家們通常會在這一步移除分析中的非必要數(shù)據(jù)。我們先看看數(shù)據(jù)（下圖）

對R語言程序員來說，上述操作等價于通過print(head(df))來打印數(shù)據(jù)的前6行，以及通過print(tail(df))來打印數(shù)據(jù)的后6行。當然Python中，默認打印是5行，而R則是6行。因此R的代碼head(df, n = 10)，在Python中就是df.head(n = 10)，打印數(shù)據(jù)尾部也是同樣道理

請點擊輸入圖片描述

在R語言中，數(shù)據(jù)列和行的名字通過colnames和rownames來分別進行提取。在Python中，我們則使用columns和index屬性來提取，如下：

# Extracting column names

print df.columns

# OUTPUT

Index([u'Abra', u'Apayao', u'Benguet', u'Ifugao', u'Kalinga'], dtype='object')

# Extracting row names or the index

print df.index

# OUTPUT

Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78], dtype='int64')

數(shù)據(jù)轉(zhuǎn)置使用T方法，

# Transpose data

print df.T

# OUTPUT

0 ? ? ?1 ? ? 2 ? ? ?3 ? ? 4 ? ? ?5 ? ? 6 ? ? ?7 ? ? 8 ? ? ?9

Abra ? ? ?1243 ? 4158 ?1787 ?17152 ?1266 ? 5576 ? 927 ?21540 ?1039 ? 5424

Apayao ? ?2934 ? 9235 ?1922 ?14501 ?2385 ? 7452 ?1099 ?17038 ?1382 ?10588

Benguet ? ?148 ? 4287 ?1955 ? 3536 ?2530 ? ?771 ?2796 ? 2463 ?2592 ? 1064

Ifugao ? ?3300 ? 8063 ?1074 ?19607 ?3315 ?13134 ?5134 ?14226 ?6842 ?13828

Kalinga ?10553 ?35257 ?4544 ?31687 ?8520 ?28252 ?3106 ?36238 ?4973 ?40140

... ? ? ? 69 ? ? 70 ? ? 71 ? ? 72 ? ? 73 ? ? 74 ? ? 75 ? ? 76 ? ? 77

Abra ? ? ... ? ?12763 ? 2470 ?59094 ? 6209 ?13316 ? 2505 ?60303 ? 6311 ?13345

Apayao ? ... ? ?37625 ?19532 ?35126 ? 6335 ?38613 ?20878 ?40065 ? 6756 ?38902

Benguet ?... ? ? 2354 ? 4045 ? 5987 ? 3530 ? 2585 ? 3519 ? 7062 ? 3561 ? 2583

Ifugao ? ... ? ? 9838 ?17125 ?18940 ?15560 ? 7746 ?19737 ?19422 ?15910 ?11096

Kalinga ?... ? ?65782 ?15279 ?52437 ?24385 ?66148 ?16513 ?61808 ?23349 ?68663

Abra ? ? ?2623

Apayao ? 18264

Benguet ? 3745

Ifugao ? 16787

Kalinga ?16900

Other transformations such as sort can be done using codesort/code attribute. Now let's extract a specific column. In Python, we do it using either codeiloc/code or codeix/code attributes, but codeix/code is more robust and thus I prefer it. Assuming we want the head of the first column of the data, we have

其他變換，例如排序就是用sort屬性。現(xiàn)在我們提取特定的某列數(shù)據(jù)。Python中，可以使用iloc或者ix屬性。但是我更喜歡用ix，因為它更穩(wěn)定一些。假設我們需數(shù)據(jù)第一列的前5行，我們有：

print df.ix[:, 0].head()

# OUTPUT 0 ? ? 1243 1 ? ? 4158 2 ? ? 1787 3 ? ?17152 4 ? ? 1266 Name: Abra, dtype: int64

順便提一下，Python的索引是從0開始而非1。為了取出從11到20行的前3列數(shù)據(jù)，我們有

print df.ix[10:20, 0:3]

# OUTPUT

Abra ?Apayao ?Benguet

10 ? ?981 ? ?1311 ? ? 2560

11 ?27366 ? 15093 ? ? 3039

12 ? 1100 ? ?1701 ? ? 2382

13 ? 7212 ? 11001 ? ? 1088

14 ? 1048 ? ?1427 ? ? 2847

15 ?25679 ? 15661 ? ? 2942

16 ? 1055 ? ?2191 ? ? 2119

17 ? 5437 ? ?6461 ? ? ?734

18 ? 1029 ? ?1183 ? ? 2302

19 ?23710 ? 12222 ? ? 2598

20 ? 1091 ? ?2343 ? ? 2654

上述命令相當于df.ix[10:20, ['Abra', 'Apayao', 'Benguet']]。

為了舍棄數(shù)據(jù)中的列，這里是列1(Apayao)和列2(Benguet)，我們使用drop屬性，如下：

print df.drop(df.columns[[1, 2]], axis = 1).head()

# OUTPUT

Abra ?Ifugao ?Kalinga

0 ? 1243 ? ?3300 ? ?10553

1 ? 4158 ? ?8063 ? ?35257

2 ? 1787 ? ?1074 ? ? 4544

3 ?17152 ? 19607 ? ?31687

4 ? 1266 ? ?3315 ? ? 8520

axis?參數(shù)告訴函數(shù)到底舍棄列還是行。如果axis等于0，那么就舍棄行。

END

統(tǒng)計描述

下一步就是通過describe屬性，對數(shù)據(jù)的統(tǒng)計特性進行描述：

print df.describe()

# OUTPUT

Abra ? ? ? ?Apayao ? ? ?Benguet ? ? ? ?Ifugao ? ? ? Kalinga

count ? ? 79.000000 ? ? 79.000000 ? ?79.000000 ? ? 79.000000 ? ? 79.000000

mean ? 12874.379747 ?16860.645570 ?3237.392405 ?12414.620253 ?30446.417722

std ? ?16746.466945 ?15448.153794 ?1588.536429 ? 5034.282019 ?22245.707692

min ? ? ?927.000000 ? ?401.000000 ? 148.000000 ? 1074.000000 ? 2346.000000

25% ? ? 1524.000000 ? 3435.500000 ?2328.000000 ? 8205.000000 ? 8601.500000

50% ? ? 5790.000000 ?10588.000000 ?3202.000000 ?13044.000000 ?24494.000000

75% ? ?13330.500000 ?33289.000000 ?3918.500000 ?16099.500000 ?52510.500000

max ? ?60303.000000 ?54625.000000 ?8813.000000 ?21031.000000 ?68663.000000

END

假設檢驗

Python有一個很好的統(tǒng)計推斷包。那就是scipy里面的stats。ttest_1samp實現(xiàn)了單樣本t檢驗。因此，如果我們想檢驗數(shù)據(jù)Abra列的稻谷產(chǎn)量均值，通過零假設，這里我們假定總體稻谷產(chǎn)量均值為15000，我們有：

from scipy import stats as ss

# Perform one sample t-test using 1500 as the true mean

print ss.ttest_1samp(a = df.ix[:, 'Abra'], popmean = 15000)

# OUTPUT

(-1.1281738488299586, 0.26270472069109496)

返回下述值組成的元祖：

t : 浮點或數(shù)組類型t統(tǒng)計量

prob : 浮點或數(shù)組類型two-tailed p-value 雙側(cè)概率值

通過上面的輸出，看到p值是0.267遠大于α等于0.05，因此沒有充分的證據(jù)說平均稻谷產(chǎn)量不是150000。將這個檢驗應用到所有的變量，同樣假設均值為15000，我們有：

print ss.ttest_1samp(a = df, popmean = 15000)

# OUTPUT

(array([ -1.12817385, ? 1.07053437, -65.81425599, ?-4.564575 ?, ? 6.17156198]),

array([ ?2.62704721e-01, ? 2.87680340e-01, ? 4.15643528e-70,

1.83764399e-05, ? 2.82461897e-08]))

第一個數(shù)組是t統(tǒng)計量，第二個數(shù)組則是相應的p值

END

可視化

Python中有許多可視化模塊，最流行的當屬matpalotlib庫。稍加提及，我們也可選擇bokeh和seaborn模塊。之前的博文中，我已經(jīng)說明了matplotlib庫中的盒須圖模塊功能。

請點擊輸入圖片描述

# Import the module for plotting

import matplotlib.pyplot as plt

plt.show(df.plot(kind = 'box'))

現(xiàn)在，我們可以用pandas模塊中集成R的ggplot主題來美化圖表。要使用ggplot，我們只需要在上述代碼中多加一行，

import matplotlib.pyplot as plt

pd.options.display.mpl_style = 'default' # Sets the plotting display theme to ggplot2

df.plot(kind = 'box')

這樣我們就得到如下圖表：

請點擊輸入圖片描述

比matplotlib.pyplot主題簡潔太多。但是在本文中，我更愿意引入seaborn模塊，該模塊是一個統(tǒng)計數(shù)據(jù)可視化庫。因此我們有：

# Import the seaborn library

import seaborn as sns

# Do the boxplot

plt.show(sns.boxplot(df, widths = 0.5, color = "pastel"))

請點擊輸入圖片描述

多性感的盒式圖，繼續(xù)往下看。

請點擊輸入圖片描述

plt.show(sns.violinplot(df, widths = 0.5, color = "pastel"))

請點擊輸入圖片描述

plt.show(sns.distplot(df.ix[:,2], rug = True, bins = 15))

請點擊輸入圖片描述

with sns.axes_style("white"):

plt.show(sns.jointplot(df.ix[:,1], df.ix[:,2], kind = "kde"))

請點擊輸入圖片描述

plt.show(sns.lmplot("Benguet", "Ifugao", df))

END

創(chuàng)建自定義函數(shù)

在Python中，我們使用def函數(shù)來實現(xiàn)一個自定義函數(shù)。例如，如果我們要定義一個兩數(shù)相加的函數(shù)，如下即可：

def add_2int(x, y):

return x + y

print add_2int(2, 2)

# OUTPUT

順便說一下，Python中的縮進是很重要的。通過縮進來定義函數(shù)作用域，就像在R語言中使用大括號{…}一樣。這有一個我們之前博文的例子：

產(chǎn)生10個正態(tài)分布樣本，其中和

基于95%的置信度，計算和?;

重復100次; 然后

計算出置信區(qū)間包含真實均值的百分比

Python中，程序如下：

import numpy as np

import scipy.stats as ss

def case(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):

m = np.zeros((rep, 4))

for i in range(rep):

norm = np.random.normal(loc = mu, scale = sigma, size = n)

xbar = np.mean(norm)

low = xbar - ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

up = xbar + ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

if (mu low) (mu up):

rem = 1

else:

rem = 0

m[i, :] = [xbar, low, up, rem]

inside = np.sum(m[:, 3])

per = inside / rep

desc = "There are " + str(inside) + " confidence intervals that contain "

"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs"

return {"Matrix": m, "Decision": desc}

上述代碼讀起來很簡單，但是循環(huán)的時候就很慢了。下面針對上述代碼進行了改進，這多虧了?Python專家

import numpy as np

import scipy.stats as ss

def case2(n = 10, mu = 3, sigma = np.sqrt(5), p = 0.025, rep = 100):

scaled_crit = ss.norm.ppf(q = 1 - p) * (sigma / np.sqrt(n))

norm = np.random.normal(loc = mu, scale = sigma, size = (rep, n))

xbar = norm.mean(1)

low = xbar - scaled_crit

up = xbar + scaled_crit

rem = (mu low) (mu up)

m = np.c_[xbar, low, up, rem]

inside = np.sum(m[:, 3])

per = inside / rep

desc = "There are " + str(inside) + " confidence intervals that contain "

"the true mean (" + str(mu) + "), that is " + str(per) + " percent of the total CIs"

return {"Matrix": m, "Decision": desc}

python類和函數(shù)的區(qū)別

一、主體不同

1、類：是面向?qū)ο蟪绦蛟O計實現(xiàn)信息封裝的基礎。

2、函數(shù)：是指一段在一起的、可以做某一件事兒的程序。也叫做子程序、（OOP中）方法。

二、特點不同

1、類：是一種用戶定義的引用數(shù)據(jù)類型，也稱類類型。每個類包含數(shù)據(jù)說明和一組操作數(shù)據(jù)或傳遞消息的函數(shù)。類的實例稱為對象。

2、函數(shù)：分為全局函數(shù)、全局靜態(tài)函數(shù)；在類中還可以定義構造函數(shù)、析構函數(shù)、拷貝構造函數(shù)、成員函數(shù)、友元函數(shù)、運算符重載函數(shù)、內(nèi)聯(lián)函數(shù)等。

三、規(guī)則不同

1、類：實質(zhì)是一種引用數(shù)據(jù)類型，類似于byte、short、int(char)、long、float、double等基本數(shù)據(jù)類型，不同的是它是一種復雜的數(shù)據(jù)類型。

2、函數(shù)：函數(shù)必須聲明后才可以被調(diào)用。調(diào)用格式為：函數(shù)名（實參）調(diào)用時函數(shù)名后的小括號中的實參必須和聲明函數(shù)時的函數(shù)括號中的形參個數(shù)相同。

參考資料來源：百度百科-函數(shù)

參考資料來源：百度百科-類

python中提供的數(shù)據(jù)類型轉(zhuǎn)換函數(shù)有哪些,作用是什么?

作用就是把合理的數(shù)據(jù)轉(zhuǎn)換為需要的類型。int()整數(shù)，float()浮點數(shù)，str()字符串，list()列表，tuple()元組，set()集合……

比如a='12'這個是字符串類型，用int函數(shù)a=int(a)這時變量a就是整型，字符串'12'變?yōu)榱苏麛?shù)12。Python沒有變量聲明的要求，變量的屬性在賦值時確定，這樣變量的類型就很靈活。

有一種題目判斷一個整數(shù)是否回文數(shù)，用字符串來處理就很簡單

a=1234321#整數(shù)

if str(a)==str(a)[::-1]:#借助字符串反轉(zhuǎn)比較就可以確定是否回文數(shù)。

還比如元組b=(1,3,2,4)，元組是不可以更新刪除排序成員的，但是列表是可以的，通過列表函數(shù)進行轉(zhuǎn)換來實現(xiàn)元組的更新刪除和排序。

b=(1,3,2,4)

b=list(b)

b.sort()

b=tuple(b)

這時得到的元組b就是一個升序的元組(1,2,3,4)

再比如你要輸入創(chuàng)建整數(shù)列表或者整數(shù)元組基本上寫法相同，就是用對應的函數(shù)來最后處理。

ls=list(map(int,input().split()))#這個就是列表

tup=tuple(map(int,input().split()))#這個就是元組

再比如有個叫集合的，集合有唯一性，可以方便用來去重。

ls=[1,2,3,1,2,3,1,2,3]

ls=list(set(ls))#通過set()去重后，現(xiàn)在的ls里就是[1,2,3]去重后的列表。

如何理解“python中函數(shù)是一等公民”？

單純的理解“python中的函數(shù)是一等公民”這句話，可能包含幾層意思：

正確的理解是：第1點和第2點，不包含第3點的意思。

python中的函數(shù)是一等公民，重點想表述的是：在python世界人人平等。

人人平等的世界，至少應該包含兩層意思：1. 身份地位平等； 2. 行使的權利平等。

打印結果是：

從上面的打印結果中可以看出：

1.Python 程序中的所有數(shù)據(jù)都是某個類的實例，因而是一個對象；

2.類本身也是一個對象。int, float, str, list, dict, set, fuction, module, NoneType, object, type等，這些類都是type類的實例，也是一個對象；

3.object類是所有類的基類

4.object是頂級父類

函數(shù)function和其他公民一樣，他們的都有一個共同的身份：對象。

在c++和java里，數(shù)據(jù)是數(shù)據(jù)，動作是動作，他倆的結合是類（class）。對于python，數(shù)據(jù)是數(shù)據(jù)，動作也可以是數(shù)據(jù)，這個牛逼的數(shù)據(jù)叫做對象object。

對于函數(shù)function來講，既可以安靜的做一個數(shù)據(jù)，又可以優(yōu)雅的執(zhí)行動作。

Python 官方文檔里這樣解釋對象（object）的含義：Python 中的對象是對數(shù)據(jù)的抽象，Python 程序中所有數(shù)據(jù)都是由對象或者對象間的關系來表示的。每個對象都有各自的編號、類型和值。

兩個祖先（a和b）占了相鄰的兩塊內(nèi)存，一個可以與它的“后代”共用內(nèi)存，一個卻只能讓“后代”另立門戶；當它們走完自己的生命周期后，b會馬上被當垃圾回收，內(nèi)存地址遺產(chǎn)被剝奪，然而a卻形滅而實存，蔭庇后世。

Python為這些對象傾斜資源，也就是為某種階層固化提供了合法性。劃分的依據(jù)是因為它們比較常用，共用內(nèi)存就意味著減少開支，提高內(nèi)存使用效率。

這就是Python有趣的地方了，一面是全體公民，一面是特權種族，組成了看似矛盾的二元對立結構。

參考：

標題名稱：函數(shù)也是數(shù)據(jù)python的簡單介紹
標題鏈接：http://weahome.cn/article/hdojgc.html

真实的国产乱ⅩXXX66竹夫人,五月香六月婷婷激情综合,亚洲日本VA一区二区三区,亚洲精品一区二区三区麻豆

函數(shù)也是數(shù)據(jù)python的簡單介紹

數(shù)據(jù)分析員用python做數(shù)據(jù)分析是怎么回事，需要用到python中的那些內(nèi)容，具體是怎么操作的?

python類和函數(shù)的區(qū)別

python中提供的數(shù)據(jù)類型轉(zhuǎn)換函數(shù)有哪些,作用是什么?

如何理解“python中函數(shù)是一等公民”？

其他資訊

網(wǎng)站制作

企業(yè)服務

網(wǎng)站建設

服務器托管