본문 바로가기
DataFrame

컬럼 shift

by 자동매매 2022. 4. 9.

출처 : 금융 데이터 분석을 위한 파이썬 판다스

 

 

금융 데이터 분석을 위한 파이썬 판다스

최근 인공지능 AI(Artificial Intelligence)이 보급화되면서 방대한 양의 데이터를 처리하는 방식이 중요해지기 시작했습니다. 판다스(Pandas)는 오픈 소 ...

wikidocs.net

예제파일

ss_ex_1.xlsx
0.01MB

 

 

shift함수: 상하 이동

import pandas as pd
df = pd.read_excel("DB/ss_ex_1.xlsx" , index_col=0)
df.index = pd.to_datetime(df.index)
df = df.sort_index()

df1 = df["거래량"].shift(1)
print(df1)

 

 

일자
2021-02-15           NaN
2021-02-16    23529706.0
2021-02-17    20483100.0
2021-02-18    18307735.0
2021-02-19    21327683.0
                 ...    
2021-08-09    13342623.0
2021-08-10    15522581.0
2021-08-11    20362639.0
2021-08-12    30241137.0
2021-08-13    42365223.0
Name: 거래량, Length: 127, dtype: float64

Process finished with exit code 0

 

df["전일거래량"] = df["거래량"].shift(1)  # 음수 - 상향 이동
df1 = df[ ['거래량', '전일거래량'] ]
print(df1)

                   거래량       전일거래량
일자                                
2021-02-15  23529706.0         NaN
2021-02-16  20483100.0  23529706.0
2021-02-17  18307735.0  20483100.0
2021-02-18  21327683.0  18307735.0
2021-02-19  25880879.0  21327683.0
...                ...         ...
2021-08-09  15522581.0  13342623.0
2021-08-10  20362639.0  15522581.0
2021-08-11  30241137.0  20362639.0
2021-08-12  42365223.0  30241137.0
2021-08-13  61270643.0  42365223.0

[127 rows x 2 columns]

df["전일거래량"] = df["거래량"].shift(1)
cond = df["거래량"] > df["전일거래량"]
df2=df[cond]
print(df2)

                 종가      대비   등락률  ...          시가총액         상장주식수       전일거래량
일자                                 ...                                        
2021-02-18  82100.0 -1100.0 -1.32  ...  4.901191e+14  5.969783e+09  18307735.0
2021-02-19  82600.0   500.0  0.61  ...  4.931040e+14  5.969783e+09  21327683.0
2021-02-24  82000.0     0.0  0.00  ...  4.895222e+14  5.969783e+09  20587314.0
2021-02-25  85300.0  3300.0  4.02  ...  5.092225e+14  5.969783e+09  26807651.0
2021-02-26  82500.0 -2800.0 -3.28  ...  4.925071e+14  5.969783e+09  34155986.0
...             ...     ...   ...  ...           ...           ...         ...
2021-08-09  81500.0     0.0  0.00  ...  4.865373e+14  5.969783e+09  13342623.0
2021-08-10  80200.0 -1300.0 -1.60  ...  4.787766e+14  5.969783e+09  15522581.0
2021-08-11  78500.0 -1700.0 -2.12  ...  4.686279e+14  5.969783e+09  20362639.0
2021-08-12  77000.0 -1500.0 -1.91  ...  4.596733e+14  5.969783e+09  30241137.0
2021-08-13  74400.0 -2600.0 -3.38  ...  4.441518e+14  5.969783e+09  42365223.0

[66 rows x 11 columns]

 

diff함수: 이전 인덱스 데이터와의 차이

print("영업일:", len(df2))  # 영업일: 66
df3=df2['거래량'].diff( ) # 이전 인덱스 데이터와의 차이
print(df3)

일자
2021-02-18           NaN
2021-02-19     4553196.0
2021-02-24      926772.0
2021-02-25     7348335.0
2021-02-26     4364814.0
                 ...    
2021-08-09   -10119787.0
2021-08-10     4840058.0
2021-08-11     9878498.0
2021-08-12    12124086.0
2021-08-13    18905420.0
Name: 거래량, Length: 66, dtype: float64

cond = df['거래량'].diff() > 0
df4=df[cond]
print(df4)

                 종가      대비   등락률  ...          시가총액         상장주식수       전일거래량
일자                                 ...                                        
2021-02-18  82100.0 -1100.0 -1.32  ...  4.901191e+14  5.969783e+09  18307735.0
2021-02-19  82600.0   500.0  0.61  ...  4.931040e+14  5.969783e+09  21327683.0
2021-02-24  82000.0     0.0  0.00  ...  4.895222e+14  5.969783e+09  20587314.0
2021-02-25  85300.0  3300.0  4.02  ...  5.092225e+14  5.969783e+09  26807651.0
2021-02-26  82500.0 -2800.0 -3.28  ...  4.925071e+14  5.969783e+09  34155986.0
...             ...     ...   ...  ...           ...           ...         ...
2021-08-09  81500.0     0.0  0.00  ...  4.865373e+14  5.969783e+09  13342623.0
2021-08-10  80200.0 -1300.0 -1.60  ...  4.787766e+14  5.969783e+09  15522581.0
2021-08-11  78500.0 -1700.0 -2.12  ...  4.686279e+14  5.969783e+09  20362639.0
2021-08-12  77000.0 -1500.0 -1.91  ...  4.596733e+14  5.969783e+09  30241137.0
2021-08-13  74400.0 -2600.0 -3.38  ...  4.441518e+14  5.969783e+09  42365223.0

[66 rows x 11 columns]

 

 

# 절대 모멘텀(momentum) 투자 전략

yeild = df['종가'] / df['종가'].shift(6)
cond = yeild >= 1.03
print(len(df[cond]))
cond_modified = cond.shift(1).fillna(False)
s = df.loc[cond_modified, '종가'] / df.loc[cond_modified, '시가']
print(s.cumprod().iloc[-1])

 

이동평균

import pandas as pd

df = pd.read_excel("DB/ss_ex_1.xlsx", index_col=0)
df.index = pd.to_datetime(df.index)
df = df.sort_index()[["종가"]]

df['종가D-1'] = df['종가'].shift(1)
df['종가D-2'] = df['종가'].shift(2)
df['ma3'] = (df['종가'] + df['종가D-1'] + df['종가D-2']) / 3
print(df.head())

                 종가    종가D-1    종가D-2           ma3
일자                                                 
2021-02-15  84200.0      NaN      NaN           NaN
2021-02-16  84900.0  84200.0      NaN           NaN
2021-02-17  83200.0  84900.0  84200.0  84100.000000
2021-02-18  82100.0  83200.0  84900.0  83400.000000
2021-02-19  82600.0  82100.0  83200.0  82633.333333

df['rolling3'] = df['종가'].rolling(3).mean()
print(df.head())

                 종가    종가D-1    종가D-2           ma3      rolling3
일자                                                               
2021-02-15  84200.0      NaN      NaN           NaN           NaN
2021-02-16  84900.0  84200.0      NaN           NaN           NaN
2021-02-17  83200.0  84900.0  84200.0  84100.000000  84100.000000
2021-02-18  82100.0  83200.0  84900.0  83400.000000  83400.000000
2021-02-19  82600.0  82100.0  83200.0  82633.333333  82633.333333

import pandas as pd

df = pd.read_excel("DB/ss_ex_1.xlsx", index_col=0)
df.index = pd.to_datetime(df.index)

df['ma5'] = df['종가'].rolling(5).mean().shift(1)

cond = df['ma5'] < df['시가']
print("상승일:", len(df[cond]))  # 상승일: 76
print("영업일:", len(df))        # 영업일: 127

 

지수이동평균(exponential moving average, EMA)

 :  최근 데이터에 높은 가중치를 부여하는 지수이동평균
# EMA(i) = k * price(i) + (1-k) * EMA(i-1)
# N : 일단위
# k=2/(N+1)

from pandas import Series

data  = [84200, 84900, 83200, 82100, 82600]
index = ["2021-02-15", "2021-02-16", "2021-02-17", "2021-02-18", "2021-02-19"]

s = Series(data, index)
s1 = s.ewm(span=3, adjust=False).mean()
print(s1)

2021-02-15    84200.00
2021-02-16    84550.00
2021-02-17    83875.00
2021-02-18    82987.50
2021-02-19    82793.75
dtype: float64

import pandas as pd
df = pd.read_excel("DB/ss_ex_1.xlsx" , index_col=0)
df.index = pd.to_datetime(df.index)
df = df.sort_index()
s=df['종가']
df['EWA'] = s.ewm(span=3,adjust=False).mean().shift(1)
cond = df['EWA'] < df['시가']
cond_modified = cond.fillna(False)
수익률 = df.loc[cond_modified,'종가']/df.loc[cond_modified,'시가']
print(수익률.cumprod().iloc[-1])  # 0.8724202765542763

 

'DataFrame' 카테고리의 다른 글

plot  (0) 2022.04.12
데이터 샘플링  (0) 2022.04.09
TimeStamp  (0) 2022.04.09
판다스 데이터프레임 (데이터 읽기/ 저장하기)  (0) 2022.04.07
판다스 데이터프레임 (고급기능2)  (0) 2022.04.07

댓글