복사와 결측치

참조 :

한 권으로 끝내는 <판다스 노트>

https://e-koreatech.step.or.kr/

0. 복사

from IPython.display import Image
import numpy as np
import pandas as pd
import seaborn as sns

df = sns.load_dataset('titanic')
df.head()

df_copy = df.copy()
df_copy.head()

df_copy.loc[:, :] = 0

1. 결측치 확인 - isnull(), isnan() ↔ notnull(), notna()

- 컬럼(column)별 결측치의 갯수를 확인하기 위해서는 sum() 함수를 붙혀주면 됩니다.

df.isnull().sum()

survived         0
pclass           0
sex              0
age            177
sibsp            0
parch            0
fare             0
embarked         2
class            0
who              0
adult_male       0
deck           688
embark_town      2
alive            0
alone            0
dtype: int64

DataFrame 전체 결측 데이터의 갯수를 합산하기 위해서는 sum()을 두 번 사용하면 됩니다.

df.isnull().sum().sum()

2. 결측 데이터 필터링

df[df['age'].isna()]

survived

pclass

sex

age

sibsp

parch

fare

embarked

class

who

adult_male

deck

embark_town

alive

alone

male

NaN

8.4583

Third

man

TRUE

NaN

Queenstown

TRUE

male

NaN

Second

man

TRUE

NaN

Southampton

yes

TRUE

female

NaN

7.225

Third

woman

FALSE

NaN

Cherbourg

yes

TRUE

male

NaN

7.225

Third

man

TRUE

NaN

Cherbourg

TRUE

female

NaN

7.8792

Third

woman

FALSE

NaN

Queenstown

yes

TRUE

...

859

male

NaN

7.2292

Third

man

TRUE

NaN

Cherbourg

TRUE

863

female

NaN

69.55

Third

woman

FALSE

NaN

Southampton

FALSE

868

male

NaN

9.5

Third

man

TRUE

NaN

Southampton

TRUE

878

male

NaN

7.8958

Third

man

TRUE

NaN

Southampton

TRUE

888

female

NaN

23.45

Third

woman

FALSE

NaN

Southampton

FALSE

177 rows × 15 columns

3. 결측치 다른 값으로 채우기 - fillna()

df['age'].fillna(700).tail()

886     27.0
887     19.0
888    700.0
889     26.0
890     32.0
Name: age, dtype: float64

df1 = df.copy()
df1['age'] = df1['age'].fillna(700)

4. 통계값으로 채우기

df1['age'].fillna(df1['age'].mean()).tail()      # 평균값으로 채우기
df1['age'].fillna(df1['age'].median()).tail()    # 중간값으로 채우기


# 최빈값(mode)으로 채울 때에는 반드시 0번째 index 지정하여 값을 추출한 후 채워야 합니다.
df1['deck'].fillna(df1['deck'].mode()[0]).tail() # 최빈값으로 채우기

5. NaN 값이 있는 데이터 제거하기 (dropna)

기본 옵션 값은 how=any로 설정되어 있으며, 다음과 같이 변경할 수 있습니다.

any: 1개 라도 NaN값이 존재시 drop
all: 모두 NaN값이 존재시 drop

df1.dropna()

df1.dropna(how='all')

저작자표시

'pandas' 카테고리의 다른 글

Groupby와 Pivot table (0)	2023.12.12
데이터 전처리, 추가, 삭제, 변환 (0)	2023.12.12
통계 (0)	2023.12.11
조회, 정렬, 조건필터 (0)	2023.12.11
Excel 파일 다루기 (0)	2023.12.11

자동매매

복사와 결측치

참조 :

한 권으로 끝내는 <판다스 노트>

https://e-koreatech.step.or.kr/

0. 복사

1. 결측치 확인 - isnull(), isnan() ↔ notnull(), notna()

2. 결측 데이터 필터링

3. 결측치 다른 값으로 채우기 - fillna()

4. 통계값으로 채우기

5. NaN 값이 있는 데이터 제거하기 (dropna)

'pandas' 카테고리의 다른 글

댓글

티스토리툴바

복사와 결측치

참조 :

한 권으로 끝내는 <판다스 노트>

https://e-koreatech.step.or.kr/

0. 복사

1. 결측치 확인 - isnull(), isnan() ↔ notnull(), notna()

2. 결측 데이터 필터링

3. 결측치 다른 값으로 채우기 - fillna()

4. 통계값으로 채우기

5. NaN 값이 있는 데이터 제거하기 (dropna)

'pandas' 카테고리의 다른 글

관련글

댓글

티스토리툴바