Numpy and Pandas--一维数据分析

一维数据分析

需要先在conda中安装这两个包，安装命令： conda install numpy,pandas #导入numpy包 import numpy as np #导入pandas包 import pandas as pd

一维数据分析：Numpy

#定义：一维数组array,参数传入的是一个列表[2,3,4,5] a = np.array([2,3,4,5]) #查询 a[0] 2 #切片访问：获取指定序号范围的元素 #a[1:3]获取到的是序号从1到3的元素 a[1:3] array([3, 4]) dtype详细信息参考网址：https://docs.scipy.org/doc/numpy-1.10.1/reference/arrays.dtypes.html #查看数据类型dtype a.dtype dtype(int32) #统计计算 #平均值 a.mean() 3.0 #标准差 a.std() 0.81649658092772603 #向量化运行：乘以标量 b=np.array([1,2,3]) c=b*4 c array([ 4, 8, 12])

一维数据分析：Pandas

#定义：Pandas一维数据结构:Series #存放6家公司某一天的股价（单位是美元）。其中腾讯427.4港元兑换成美元是54.74 stockS=pd.Series([54.74,190.9,173.14,1050.3,181.86,1139.49], index=[腾讯, 阿里巴巴, 苹果, 谷歌, Facebook, 亚马逊]) stockS 腾讯 54.74 阿里巴巴 190.90 苹果 173.14 谷歌 1050.30 Facebook 181.86 亚马逊 1139.49 dtype: float64 #获取描述统计信息 stockS.describe() count 6.000000 mean 465.071667 std 491.183757 min 54.740000 25% 175.320000 50% 186.380000 75% 835.450000 max 1139.490000 dtype: float64 #iloc属性用于根据索引获取值 stockS.iloc[0] 54.740000000000002 #loc属性用于根据索引获取值 stockS.loc[腾讯] 54.740000000000002 #向量化运算：向量相加 s1=pd.Series([1,2,3,4],index=[a,b,c,d]) s2=pd.Series([10,20,30,40],index=[a,b,e,f]) s3=s1+s2 s3 a 11.0 b 22.0 c NaN d NaN e NaN f NaN dtype: float64 #方法1：删除缺失值 s3.dropna() a 11.0 b 22.0 dtype: float64 #方法2：将缺失值进行填充 s3=s1.add(s2,fill_value=0) s3 a 11.0 b 22.0 c 3.0 d 4.0 e 30.0 f 40.0 dtype: float64