pandas練習最終篇
上個周末我們重新學習了對於時間的一些處理,以及基本操作,那麼從今天開始,我將會把pandas練習這個系列結束掉。
先上鏈接
andre4life/pandas_exercises由於上周剛剛複習了時間相關的庫,所以我將順序做了一個調整,先做Time_Series,
前面幾題還是很基礎的,
Step 3. Assign it to a variable apple
url = https://raw.githubusercontent.com/guipsamora/pandas_exercises/master/09_Time_Series/Apple_Stock/appl_1980_2014.csv
apple = pd.read_csv(url)
apple.head()
Step 4. Check out the type of the columns
apple.dtypes
Step 5. Transform the Date column as a datetime type
這個剛好是我們周末學到的,主要使用to_datetime
apple.Date = pd.to_datetime(apple.Date)
apple.head()
但是這個不能說明已經轉換成功了,所以使用了上題的解決方法
apple.dtypes
Step 6. Set the date as the index
這個要用到set_index
apple = apple.set_index(Date)
apple.head()
可以發現這裡已經轉換成功了
Step 7. Is there any duplicate dates
分析:因為這裡已經把時間設置為索引了,所以,這題的考點在於是否有索引是重複的。需要
用到index.is_unique
apple.index.is_unique
可以看到這裡是沒有重複的時間的
Step 8. Ops...it seems the index is from the most recent date. Make the first entry the oldest date.
讓把時間從大到小排列,需要用到sort_index以及要把ascending=True
apple.sort_index(ascending = True).head()
Step 9. Get the last business day of each month
題的意思是求得每個月的最後一個工作日,那這裡就需要用到我們之前學習的resample
以及工作日對應的是BM
apple_month = apple.resample(BM).mean()
apple_month.head()
Step 10. What is the difference in days between the first day and the oldest
求最小的天數和最大天數的差值
(apple.index.max() - apple.index.min()).days
Step 11. How many months in the data we have
這裡是要求有總共有多少月,同樣使用resample
apple_months = apple.resample(BM).mean()
len(apple_months.index)
好的,讓我繼續下一個部分
Investor - Flow of Funds - US
Step 3. Assign it to a variable called df
url = https://raw.githubusercontent.com/datasets/investor-flow-of-funds-us/master/data/weekly.csv
df = pd.read_csv(url)
df.head()
Step 4. What is the frequency of the dataset?
weekly
Step 5. Set the column Date as the index.
df = df.set_index(Date)
df.head()
Step 6. What is the type of the index?
df.index
Step 7. Set the index to a DatetimeIndex type
df.index = pd.to_datetime(df.index)
type(df.index)
Step 8. Change the frequency to monthly, sum the values and assign it to monthly.
monthly = df.resample(M).sum()
monthly
Step 9. You will notice that it filled the dataFrame with months that dont have any data with NaN. Lets drop these rows.
monthly = monthly.dropna()
monthly
Step 10. Good, now we have the monthly data. Now change the frequency to year.
year = monthly.resample(AS-JAN).sum()
year
好的上面的就是時間序列這一部分的練習,其實相對來講題目還是很簡單的。
pandas練習這個系列終於結束了,前前後後差不多用了兩周的時間。
今天我去參加了一個面試,面試的過程覺得自己就像一個菜雞,所以革命尚未成功,同志仍需努力啊!
接下來,希望用10天時間做兩個完整的項目,然後要重新投簡歷了!
加油啊!
確定不點個贊安慰一下嗎?
??????
推薦閱讀: