博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
吴裕雄--天生自然 python数据分析:葡萄酒分析
阅读量:4313 次
发布时间:2019-06-06

本文共 2765 字,大约阅读时间需要 9 分钟。

# import pandasimport pandas as pd# creating a DataFramepd.DataFrame({
'Yes': [50, 31], 'No': [101, 2]})

# another example of creating a dataframepd.DataFrame({
'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland']})

pd.DataFrame({
'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']}, index = ['Product A', 'Product B'])

# creating a pandas seriespd.Series([1, 2, 3, 4, 5])

# we can think of a Series as a column of a DataFrame.# we can assign index values to Series in same way as pandas DataFramepd.Series([10, 20, 30], index=['2015 sales', '2016 sales', '2017 sales'], name='Product A')

# reading a csv file and storing it in a variablewine_reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv")
# we can use the 'shape' attribute to check size of datasetwine_reviews.shape

# To show first five rows of data, use 'head()' methodwine_reviews.head()

wine_reviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv", index_col=0)wine_reviews.head()

wine_reviews.head().to_csv("F:\\wine_reviews.csv")

import pandas as pdreviews = pd.read_csv("F:\\kaggleDataSet\\wine-reviews\\winemag-data-130k-v2.csv", index_col=0)pd.set_option("display.max_rows", 5)
reviews

# access 'country' property (or column) of 'reviews' reviews.country

# Another way to do above operation# when a column name contains space, we have to use this methodreviews['country']

# To access first row of country columnreviews['country'][0]

# returns first rowreviews.iloc[0]

# returns first column (country) (all rows due to ':')reviews.iloc[:, 0]

# retruns first 3 rows of first columnreviews.iloc[:3, 0]

# we can pass a list of indices of rows/columns to selectreviews.iloc[[0, 1, 2, 3], 0]

# We can also pass negative numbers as we do in Pythonreviews.iloc[-5:]

# To select first entry in country columnreviews.loc[0, 'country']

# select columns by name using 'loc'reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]

# 'set_index' to the 'title' fieldreviews.set_index('title')

# 1. Find out whether wine is produced in Italyreviews.country == 'Italy'

# 2. Now select all wines produced in Italyreviews.loc[reviews.country == 'Italy'] #reviews[reviews.country == 'Italy']

# Add one more condition for points to find better than average wines produced in Italyreviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)]  # use | for 'OR' condition

reviews.loc[reviews.country.isin(['Italy', 'France'])]

reviews.loc[reviews.price.notnull()]

reviews['critic'] = 'everyone'reviews.critic

# using iterable for assigningreviews['index_backwards'] = range(len(reviews), 0, -1)reviews['index_backwards']

 

转载于:https://www.cnblogs.com/tszr/p/11233831.html

你可能感兴趣的文章
解决vmware与主机无法连通的问题
查看>>
做好产品
查看>>
项目管理经验
查看>>
笔记:Hadoop权威指南 第8章 MapReduce 的特性
查看>>
JMeter响应数据出现乱码的处理-三种解决方式
查看>>
获取设备实际宽度
查看>>
Notes on <High Performance MySQL> -- Ch3: Schema Optimization and Indexing
查看>>
Kafka的安装和配置
查看>>
Alpha冲刺(10/10)
查看>>
数组Array的API2
查看>>
为什么 Redis 重启后没有正确恢复之前的内存数据
查看>>
No qualifying bean of type available问题修复
查看>>
第四周助教心得体会
查看>>
spfile
查看>>
Team Foundation Service更新:改善了导航和项目状态速查功能
查看>>
0x13 链表与邻接表
查看>>
js封装设置获取cookie
查看>>
bzoj 1002 [FJOI2007]轮状病毒 Matrix-Tree定理+递推
查看>>
C#面向对象模式设计第十四讲:Template Method 模板模式(行为型模式)
查看>>
linux后台运行命令:&和nohup
查看>>