# Lab4_1 Pandas 数据分析

Pandas 数据导入；数据变换处理；统计汇总描述；假设检验；可视化等

## 实验环境

### 硬件

• Intel(R) Core(TM) i7-6567U CPU @3.30GHZ 3.31GHz
• 8.00GB RAM

### 软件

• Windows 10, 64-bit (Build 17763) 10.0.17763
• Windows Subsystem for Linux [Ubuntu 18.04.2 LTS]：WSL 是以软件的形式运行在 Windows 下的 Linux 子系统，是近些年微软推出来的新工具，可以在 Windows 系统上原生运行 Linux。
• Python 3.7.4 64-bit (‘anaconda3’:virtualenv)
• juyter nootbooks

## 实验步骤、代码、结果与分析

### 导入数据，将 df.DataFrame 转换成 Numpy 数据类型

#### input

``````import pandas as pd
import numpy as np
np.array(df) # 将 df.DataFrame 转换成 Numpy 数据类型
``````

#### output

``````array([['I', 10.0, 8.04],
['I', 8.0, 6.95],
['I', 13.0, 7.58],
['I', 9.0, 8.81],
['I', 11.0, 8.33],
['I', 14.0, 9.96],
['I', 6.0, 7.24],
['I', 4.0, 4.26],
['I', 12.0, 10.84],
['I', 7.0, 4.82],
['I', 5.0, 5.68],
['II', 10.0, 9.14],
['II', 8.0, 8.14],
['II', 13.0, 8.74],
['II', 9.0, 8.77],
['II', 11.0, 9.26],
['II', 14.0, 8.1],
['II', 6.0, 6.13],
['II', 4.0, 3.1],
['II', 12.0, 9.13],
['II', 7.0, 7.26],
['II', 5.0, 4.74],
['III', 10.0, 7.46],
['III', 8.0, 6.77],
['III', 13.0, 12.74],
['III', 9.0, 7.11],
['III', 11.0, 7.81],
['III', 14.0, 8.84],
['III', 6.0, 6.08],
['III', 4.0, 5.39],
['III', 12.0, 8.15],
['III', 7.0, 6.42],
['III', 5.0, 5.73],
['IV', 8.0, 6.58],
['IV', 8.0, 5.76],
['IV', 8.0, 7.71],
['IV', 8.0, 8.84],
['IV', 8.0, 8.47],
['IV', 8.0, 7.04],
['IV', 8.0, 5.25],
['IV', 19.0, 12.5],
['IV', 8.0, 5.56],
['IV', 8.0, 7.91],
['IV', 8.0, 6.89]], dtype=object)
``````

### 操作说明 pandas 数据缺失值处理：reindex()，dropna()，fillna()，isnull()

``````df.reindex(['a','b','c'])
``````

``````df.reindex(['a','b','c']).dropna()
``````

``````df.reindex(['a','b','c']).fillna(17341163)
``````

``````df.isnull()
``````

### 探索 pandas 中的可视化工具（折线图、饼状图等）对该数据集进行深入分析

``````import matplotlib
matplotlib.pyplot.show(df.mean().plot(kind = 'line')) # 折线图
``````

### 探索 pandas 中分组及统计操作。利用 pandas 或 numpy 统计函数，计算 Anscombe 数据的统计值

``````df.groupby(by="dataset").mean()
``````

## 使用的数据集`anscombe.csv`

``````dataset,x,y
I,10.0,8.04
I,8.0,6.95
I,13.0,7.58
I,9.0,8.81
I,11.0,8.33
I,14.0,9.96
I,6.0,7.24
I,4.0,4.26
I,12.0,10.84
I,7.0,4.82
I,5.0,5.68
II,10.0,9.14
II,8.0,8.14
II,13.0,8.74
II,9.0,8.77
II,11.0,9.26
II,14.0,8.1
II,6.0,6.13
II,4.0,3.1
II,12.0,9.13
II,7.0,7.26
II,5.0,4.74
III,10.0,7.46
III,8.0,6.77
III,13.0,12.74
III,9.0,7.11
III,11.0,7.81
III,14.0,8.84
III,6.0,6.08
III,4.0,5.39
III,12.0,8.15
III,7.0,6.42
III,5.0,5.73
IV,8.0,6.58
IV,8.0,5.76
IV,8.0,7.71
IV,8.0,8.84
IV,8.0,8.47
IV,8.0,7.04
IV,8.0,5.25
IV,19.0,12.5
IV,8.0,5.56
IV,8.0,7.91
IV,8.0,6.89
``````