Data/Pandas

[AI class day12] 파이썬 판다스 python pandas TIL

makeitworth 2021. 5. 5. 23:46

감상 : 어제에 이어 파이썬 라이브러리 판다스를 공부하고, 적용하는 연습을 해본 시간

마찬가지로 지난 데이터 분석 과정 수강 때 공부했던 내용이라 복습을 하는 정도였다.

2. 파이썬으로 데이터 주무르기, pandas

pandas를 활용해서 데이터프레임을 다뤄봅시다.

I. pandas 시작하기

import pandas as pd

II. pandas로 1차원 데이터 다루기 - Series

1-D labeled array
인덱스를 지정해줄 수 있음

s = pd.Series([1,4,9,16,25])
s

0     1
1     4
2     9
3    16
4    25
dtype: int64

#인덱스 지정이 가능하기 때문에 딕셔너리로 생성이 가능함

t = pd.Series({'one':1, 'two': 2, 'three': 3, 'four': 4, 'five': 5})
t

one      1
two      2
three    3
four     4
five     5
dtype: int64

Series vs. Numpy

Series는 ndarray와 유사하다

s[1]

#t도 인덱싱, 슬라이싱 다 된다.
t[1]

t[1:3]

two      2
three    3
dtype: int64

s[s > s.median()] #자기 자신의 중앙값보다 큰 값들만 가지고 와라 -> 조건을 넣어서 필터링 할 수 있다. (리스트에서는 불가능함)

3    16
4    25
dtype: int64

#원하는 인덱스를 리스트로 만들어서 넣을 수 있다.
s[[3,1,4]]

3    16
1     4
4    25
dtype: int64

#numpy 함수도 활용가능하다
import numpy as np
np.exp(s)

0    2.718282e+00
1    5.459815e+01
2    8.103084e+03
3    8.886111e+06
4    7.200490e+10
dtype: float64

s.dtype

dtype('int64')

Series vs. dict

Series는 dict와 유사

one      1
two      2
three    3
four     4
five     5
dtype: int64

# 딕셔너리에서 key값을 통해 value를 호출하는 것과 똑같이 할 수 있다.
t['one']

# 시리즈에 값 추가하는 것도 딕셔너리와 똑같이 할 수 있다.
t['six'] = 6
t

one      1
two      2
three    3
four     4
five     5
six      6
dtype: int64

# in 연산자 사용도 마찬가지로 가능
'six' in t

True

'seven' in t

False

# 없는 키를 통해 접근하려 하면 KeyError 발생
t['seven']

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()


pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()


pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()


KeyError: 'seven'


The above exception was the direct cause of the following exception:


KeyError                                  Traceback (most recent call last)

<ipython-input-18-233205f0cca0> in <module>
----> 1 t['seven']


/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in __getitem__(self, key)
    880 
    881         elif key_is_scalar:
--> 882             return self._get_value(key)
    883 
    884         if is_hashable(key):


/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py in _get_value(self, label, takeable)
    987 
    988         # Similar to Index.get_value, but we do not fall back to positional
--> 989         loc = self.index.get_loc(label)
    990         return self.index._get_values_for_loc(self, loc, label)
    991 


/opt/anaconda3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
-> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:


KeyError: 'seven'

# get 으로 대체한다 (value 없을 때 아무것도 호출하지 않거나, default값도 줄 수 있다)
t.get('seven')

t.get('seven', 0)

Series에 이름 붙이기

각 Series는 name 속성이 있음
처음 Series를 만들 때 이름을 붙일 수 있음

s = pd.Series(np.random.randn(5), name = "random_nums")
s

0    1.186059
1    2.019542
2    0.751416
3   -1.176117
4   -0.726515
Name: random_nums, dtype: float64

#이름에 접근할려면 .name
s.name == 'random_nums'

True

# 이름 변경도 가능

s.name = "임의의 난수"
s

0    1.186059
1    2.019542
2    0.751416
3   -1.176117
4   -0.726515
Name: 임의의 난수, dtype: float64

III. Pandas로 2차원 데이터 다루기 - dataframe

2-D labeled table
인덱스를 지정해줄 수 있음

# 데이터 프레임 만들기
# 1. 딕셔너리 활용 (시리즈는 리스트 활용하는 것을 처음 했지만, 데이터 프레임은 리스트로만 만들기는 버거움)
d = {"height":[1,2,3,4], "weight":[30, 40, 50, 60]}

df = pd.DataFrame(d)
df

	height	weight
0	1	30
1	2	40
2	3	50
3	4	60

# 데이터 프레임은 넘파이 어레이와는 다르게 다양한 타입을 다룰 수 있음, 어떤 자료형인지 체크해줄 필요가 있음
# dtype 확인 (numpy는 numpy.array.dtype) 데이터 프레임은 컬럼마다 다를 수 있음 --> 복수 dtypes 명령어
df.dtypes

height    int64
weight    int64
dtype: object

From CSV to DataFrame

기존의 데이터를 불러와서 데이터 프레임화 할 수 있음
CSV (comma seperated value)
read_csv()를 이용

# 동일경로에 country_wise_latest.csv가 있다면:
covid = pd.read_csv("./country_wise_latest.csv")
covid

	Country/Region	Confirmed	Deaths	Recovered	Active	New cases	New deaths	New recovered	Deaths / 100 Cases	Recovered / 100 Cases	Deaths / 100 Recovered	Confirmed last week	1 week change	1 week % increase	WHO Region
0	Afghanistan	36263	1269	25198	9796	106	10	18	3.50	69.49	5.04	35526	737	2.07	Eastern Mediterranean
1	Albania	4880	144	2745	1991	117	6	63	2.95	56.25	5.25	4171	709	17.00	Europe
2	Algeria	27973	1163	18837	7973	616	8	749	4.16	67.34	6.17	23691	4282	18.07	Africa
3	Andorra	907	52	803	52	10	0	0	5.73	88.53	6.48	884	23	2.60	Europe
4	Angola	950	41	242	667	18	1	0	4.32	25.47	16.94	749	201	26.84	Africa
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
182	West Bank and Gaza	10621	78	3752	6791	152	2	0	0.73	35.33	2.08	8916	1705	19.12	Eastern Mediterranean
183	Western Sahara	10	1	8	1	0	0	0	10.00	80.00	12.50	10	0	0.00	Africa
184	Yemen	1691	483	833	375	10	4	36	28.56	49.26	57.98	1619	72	4.45	Eastern Mediterranean
185	Zambia	4552	140	2815	1597	71	1	465	3.08	61.84	4.97	3326	1226	36.86	Africa
186	Zimbabwe	2704	36	542	2126	192	2	24	1.33	20.04	6.64	1713	991	57.85	Africa

187 rows × 15 columns

Pandas 활용 1. 일부분만 관찰하기

head(n): 처음 n개의 데이터 참조
tail(n): 마지막 n개의 데이터 참조

# 위에서부터 5개를 관찰하는 방법(함수)
covid.head(5)

	Country/Region	Confirmed	Deaths	Recovered	Active	New cases	New deaths	New recovered	Deaths / 100 Cases	Recovered / 100 Cases	Deaths / 100 Recovered	Confirmed last week	1 week change	1 week % increase	WHO Region
0	Afghanistan	36263	1269	25198	9796	106	10	18	3.50	69.49	5.04	35526	737	2.07	Eastern Mediterranean
1	Albania	4880	144	2745	1991	117	6	63	2.95	56.25	5.25	4171	709	17.00	Europe
2	Algeria	27973	1163	18837	7973	616	8	749	4.16	67.34	6.17	23691	4282	18.07	Africa
3	Andorra	907	52	803	52	10	0	0	5.73	88.53	6.48	884	23	2.60	Europe
4	Angola	950	41	242	667	18	1	0	4.32	25.47	16.94	749	201	26.84	Africa

covid.tail(5)

	Country/Region	Confirmed	Deaths	Recovered	Active	New cases	New deaths	New recovered	Deaths / 100 Cases	Recovered / 100 Cases	Deaths / 100 Recovered	Confirmed last week	1 week change	1 week % increase	WHO Region
182	West Bank and Gaza	10621	78	3752	6791	152	2	0	0.73	35.33	2.08	8916	1705	19.12	Eastern Mediterranean
183	Western Sahara	10	1	8	1	0	0	0	10.00	80.00	12.50	10	0	0.00	Africa
184	Yemen	1691	483	833	375	10	4	36	28.56	49.26	57.98	1619	72	4.45	Eastern Mediterranean
185	Zambia	4552	140	2815	1597	71	1	465	3.08	61.84	4.97	3326	1226	36.86	Africa
186	Zimbabwe	2704	36	542	2126	192	2	24	1.33	20.04	6.64	1713	991	57.85	Africa

Pandas 활용 2. 데이터 접근하기

df['column_name'] or df.column_name

covid['Active']

0      9796
1      1991
2      7973
3        52
4       667
       ... 
182    6791
183       1
184     375
185    1597
186    2126
Name: Active, Length: 187, dtype: int64

covid.Active

0      9796
1      1991
2      7973
3        52
4       667
       ... 
182    6791
183       1
184     375
185    1597
186    2126
Name: Active, Length: 187, dtype: int64

둘의 차이점: column name에 띄어쓰기가 있으면 attribute적 접근법이 불가능함

covid.WHO Region

  File "<ipython-input-40-1174247fd45e>", line 1
    covid.WHO Region
              ^
SyntaxError: invalid syntax

covid['WHO Region']

0      Eastern Mediterranean
1                     Europe
2                     Africa
3                     Europe
4                     Africa
               ...          
182    Eastern Mediterranean
183                   Africa
184    Eastern Mediterranean
185                   Africa
186                   Africa
Name: WHO Region, Length: 187, dtype: object

#데이터 프레임의 각 컬럼은 시리즈이다 -> 여러 활용 가능
type(covid['Confirmed'])

pandas.core.series.Series

# 인덱싱 가능
covid['Confirmed'][0]

# 슬라이싱도 가능
covid['Confirmed'][1:5]

1     4880
2    27973
3      907
4      950
Name: Confirmed, dtype: int64

Pandas 활용 3. "조건"을 이용해서 데이터 접근하기

# 신규 확진자가 100명이 넘는 나라 찾기
covid['New cases'] > 100

0       True
1       True
2       True
3      False
4      False
       ...  
182     True
183    False
184    False
185    False
186     True
Name: New cases, Length: 187, dtype: bool

covid[covid['New cases']>100]

	Country/Region	Confirmed	Deaths	Recovered	Active	New cases	New deaths	New recovered	Deaths / 100 Cases	Recovered / 100 Cases	Deaths / 100 Recovered	Confirmed last week	1 week change	1 week % increase	WHO Region
0	Afghanistan	36263	1269	25198	9796	106	10	18	3.50	69.49	5.04	35526	737	2.07	Eastern Mediterranean
1	Albania	4880	144	2745	1991	117	6	63	2.95	56.25	5.25	4171	709	17.00	Europe
2	Algeria	27973	1163	18837	7973	616	8	749	4.16	67.34	6.17	23691	4282	18.07	Africa
6	Argentina	167416	3059	72575	91782	4890	120	2057	1.83	43.35	4.21	130774	36642	28.02	Americas
8	Australia	15303	167	9311	5825	368	6	137	1.09	60.84	1.79	12428	2875	23.13	Western Pacific
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
177	United Kingdom	301708	45844	1437	254427	688	7	3	15.19	0.48	3190.26	296944	4764	1.60	Europe
179	Uzbekistan	21209	121	11674	9414	678	5	569	0.57	55.04	1.04	17149	4060	23.67	Europe
180	Venezuela	15988	146	9959	5883	525	4	213	0.91	62.29	1.47	12334	3654	29.63	Americas
182	West Bank and Gaza	10621	78	3752	6791	152	2	0	0.73	35.33	2.08	8916	1705	19.12	Eastern Mediterranean
186	Zimbabwe	2704	36	542	2126	192	2	24	1.33	20.04	6.64	1713	991	57.85	Africa

82 rows × 15 columns

covid[covid['New cases']>100].head(5)

	Country/Region	Confirmed	Deaths	Recovered	Active	New cases	New deaths	New recovered	Deaths / 100 Cases	Recovered / 100 Cases	Deaths / 100 Recovered	Confirmed last week	1 week change	1 week % increase	WHO Region
0	Afghanistan	36263	1269	25198	9796	106	10	18	3.50	69.49	5.04	35526	737	2.07	Eastern Mediterranean
1	Albania	4880	144	2745	1991	117	6	63	2.95	56.25	5.25	4171	709	17.00	Europe
2	Algeria	27973	1163	18837	7973	616	8	749	4.16	67.34	6.17	23691	4282	18.07	Africa
6	Argentina	167416	3059	72575	91782	4890	120	2057	1.83	43.35	4.21	130774	36642	28.02	Americas
8	Australia	15303	167	9311	5825	368	6	137	1.09	60.84	1.79	12428	2875	23.13	Western Pacific

# WHO 지역이 동남 아시아인 나라 찾기
# 1. 'WHO Region'의 여러 value 확인
covid['WHO Region'].unique()

array(['Eastern Mediterranean', 'Europe', 'Africa', 'Americas',
       'Western Pacific', 'South-East Asia'], dtype=object)

# 2. 원하는 value가 'South-East Asia'임을 확인하여 조건 작성
covid[covid['WHO Region'] == 'South-East Asia']

	Country/Region	Confirmed	Deaths	Recovered	Active	New cases	New deaths	New recovered	Deaths / 100 Cases	Recovered / 100 Cases	Deaths / 100 Recovered	Confirmed last week	1 week change	1 week % increase	WHO Region
13	Bangladesh	226225	2965	125683	97577	2772	37	1801	1.31	55.56	2.36	207453	18772	9.05	South-East Asia
19	Bhutan	99	0	86	13	4	0	1	0.00	86.87	0.00	90	9	10.00	South-East Asia
27	Burma	350	6	292	52	0	0	2	1.71	83.43	2.05	341	9	2.64	South-East Asia
79	India	1480073	33408	951166	495499	44457	637	33598	2.26	64.26	3.51	1155338	324735	28.11	South-East Asia
80	Indonesia	100303	4838	58173	37292	1525	57	1518	4.82	58.00	8.32	88214	12089	13.70	South-East Asia
106	Maldives	3369	15	2547	807	67	0	19	0.45	75.60	0.59	2999	370	12.34	South-East Asia
119	Nepal	18752	48	13754	4950	139	3	626	0.26	73.35	0.35	17844	908	5.09	South-East Asia
158	Sri Lanka	2805	11	2121	673	23	0	15	0.39	75.61	0.52	2730	75	2.75	South-East Asia
167	Thailand	3297	58	3111	128	6	0	2	1.76	94.36	1.86	3250	47	1.45	South-East Asia
168	Timor-Leste	24	0	0	24	0	0	0	0.00	0.00	0.00	24	0	0.00	South-East Asia

Pandas 활용 4. row를 기준으로 데이터 접근하기

# 예시 데이터 - 도서관 정보

books_dict = {"Available":[True, True, False], "Location":[102, 215, 323], "Genre": ["Programming", "Pysics", "Math"]}
books_df = pd.DataFrame(books_dict, index = ["버그란 무엇인가","두근두근 물리학","미분해줘 홈즈"])
books_df

	Available	Location	Genre
버그란 무엇인가	True	102	Programming
두근두근 물리학	True	215	Pysics
미분해줘 홈즈	False	323	Math

인덱스를 이용해서 가져오기 : `.loc[row,col]`

books_df.loc["버그란 무엇인가"]

Available           True
Location             102
Genre        Programming
Name: 버그란 무엇인가, dtype: object

# row를 기반으로 인덱싱한 결과도 데이터 타입은 시리즈다

type(books_df.loc["버그란 무엇인가"])

pandas.core.series.Series

#'미분해줘 홈즈'책이 대출 가능한지 알 수 있나?
books_df.loc["미분해줘 홈즈"]['Available']

False

books_df.loc["미분해줘 홈즈",'Available']

False

숫자 인덱스를 이용해서 가져오기 : `.iloc[rowidx,colidx]` (숫자)

# 인덱스 0행의 1열 가져오기
books_df.iloc[0,1]

# 인덱스 1행의 인덱스 0~1열 가져오기
books_df.iloc[1,0:2]

Available    True
Location      215
Name: 두근두근 물리학, dtype: object

Pandas 활용 5. groupby

Split : 특정한 "기준"을 바탕으로 DataFrame을 분할
Apply : 통계 함수 - sum(), mean(), median(), - 을 적용해서 각 데이터를 압축
Combine : Apply된 결과를 바탕으로 새로운 Series를 생성 (group_key :applied_value )

# WHO Reigion 별 확진자수를 확인하고 싶다

# 1. covid에서 확진자수 컬럼만 추출
# 2. 이를 covid의 WHO Region을 기준으로 groupby한다.

covid_by_region = covid['Confirmed'].groupby(by = covid["WHO Region"])
covid_by_region 
#split만 적용한 상태라서 객체 형태로 출력 -> apply를 적용하면 바로 combine 된다.

<pandas.core.groupby.generic.SeriesGroupBy object at 0x11b9381f0>

covid_by_region.sum()

WHO Region
Africa                    723207
Americas                 8839286
Eastern Mediterranean    1490744
Europe                   3299523
South-East Asia          1835297
Western Pacific           292428
Name: Confirmed, dtype: int64

# 국가당 감염자 수를 확인하고 싶다면
covid_by_region.mean()

WHO Region
Africa                    15066.812500
Americas                 252551.028571
Eastern Mediterranean     67761.090909
Europe                    58920.053571
South-East Asia          183529.700000
Western Pacific           18276.750000
Name: Confirmed, dtype: float64

Mission:

1. covid 데이터에서 100 case 대비 사망률(`Deaths / 100 Cases`)이 가장 높은 국가는?

#1. .idxmax() 활용
covid['Country/Region'][covid['Deaths / 100 Cases'].idxmax()]

'Yemen'

#2. sort_values() 활용
covid.sort_values(by = ['Deaths / 100 Cases'], ascending=False).iloc[0,0]

'Yemen'

2. covid 데이터에서 신규 확진자가 없는 나라 중 WHO Region이 'Europe'를 모두 출력하면?

Hint : 한 줄에 동시에 두가지 조건을 Apply하는 경우 Warning이 발생할 수 있습니다.

covid[(covid['New cases'] == 0) & (covid['WHO Region'] == "Europe")]

	Country/Region	Confirmed	Deaths	Recovered	Active	New recovered	Deaths / 100 Cases	Recovered / 100 Cases	Deaths / 100 Recovered	Confirmed last week	1 week change	1 week % increase	WHO Region
56	Estonia	2034	69	1923	42	1	3.39	94.54	3.59	2021	13	0.64	Europe
75	Holy See	12	0	12	0	0	0.00	100.00	0.00	12	0	0.00	Europe
95	Latvia	1219	31	1045	143	0	2.54	85.73	2.97	1192	27	2.27	Europe
100	Liechtenstein	86	1	81	4	0	1.16	94.19	1.23	86	0	0.00	Europe
113	Monaco	116	4	104	8	0	3.45	89.66	3.85	109	7	6.42	Europe
143	San Marino	699	42	657	0	0	6.01	93.99	6.39	699	0	0.00	Europe
157	Spain	272421	28432	150376	93613	0	10.44	55.20	18.91	264836	7585	2.86	Europe

3. 다음 데이터를 이용해 각 Region별로 아보카도가 가장 비싼 평균가격(AveragePrice)을 출력하면?

avocado = pd.read_csv("./avocado.csv")
avocado.head(5)

	Unnamed: 0	Date	AveragePrice	Total Volume	4046	4225	4770	Total Bags	Small Bags	Large Bags	type	year	region
0	0	2015-12-27	1.33	64236.62	1036.74	54454.85	48.16	8696.87	8603.62	93.25	conventional	2015	Albany
1	1	2015-12-20	1.35	54876.98	674.28	44638.81	58.33	9505.56	9408.07	97.49	conventional	2015	Albany
2	2	2015-12-13	0.93	118220.22	794.70	109149.67	130.50	8145.35	8042.21	103.14	conventional	2015	Albany
3	3	2015-12-06	1.08	78992.15	1132.00	71976.41	72.58	5811.16	5677.40	133.76	conventional	2015	Albany
4	4	2015-11-29	1.28	51039.60	941.48	43838.39	75.78	6183.95	5986.26	197.69	conventional	2015	Albany

avocado['region'].unique()

array(['Albany', 'Atlanta', 'BaltimoreWashington', 'Boise', 'Boston',
       'BuffaloRochester', 'California', 'Charlotte', 'Chicago',
       'CincinnatiDayton', 'Columbus', 'DallasFtWorth', 'Denver',
       'Detroit', 'GrandRapids', 'GreatLakes', 'HarrisburgScranton',
       'HartfordSpringfield', 'Houston', 'Indianapolis', 'Jacksonville',
       'LasVegas', 'LosAngeles', 'Louisville', 'MiamiFtLauderdale',
       'Midsouth', 'Nashville', 'NewOrleansMobile', 'NewYork',
       'Northeast', 'NorthernNewEngland', 'Orlando', 'Philadelphia',
       'PhoenixTucson', 'Pittsburgh', 'Plains', 'Portland',
       'RaleighGreensboro', 'RichmondNorfolk', 'Roanoke', 'Sacramento',
       'SanDiego', 'SanFrancisco', 'Seattle', 'SouthCarolina',
       'SouthCentral', 'Southeast', 'Spokane', 'StLouis', 'Syracuse',
       'Tampa', 'TotalUS', 'West', 'WestTexNewMexico'], dtype=object)

averageprice_by_region = avocado['AveragePrice'].groupby(by = avocado['region'])

averageprice_by_region.max()

저작자표시 (새창열림)

'Data > Pandas' 카테고리의 다른 글

Q. 데이터프레임에 중복값이 있는지 확인하려면? A. pandas.DataFrame.duplicated (0)	2022.10.06
pandas에서는 for문 말고 메소드를 쓰자 (0)	2021.12.09
numpy, pandas 연습 문제 링크 (0)	2021.05.03

현재글[AI class day12] 파이썬 판다스 python pandas TIL

Rolling Snowball

[AI class day12] 파이썬 판다스 python pandas TIL

2. 파이썬으로 데이터 주무르기, pandas

I. pandas 시작하기

II. pandas로 1차원 데이터 다루기 - Series

Series vs. Numpy

Series vs. dict

Series에 이름 붙이기

III. Pandas로 2차원 데이터 다루기 - dataframe

From CSV to DataFrame

Pandas 활용 1. 일부분만 관찰하기

Pandas 활용 2. 데이터 접근하기

Pandas 활용 3. "조건"을 이용해서 데이터 접근하기

Pandas 활용 4. row를 기준으로 데이터 접근하기

인덱스를 이용해서 가져오기 : `.loc[row,col]`

숫자 인덱스를 이용해서 가져오기 : `.iloc[rowidx,colidx]` (숫자)

Pandas 활용 5. groupby

Mission:

1. covid 데이터에서 100 case 대비 사망률(`Deaths / 100 Cases`)이 가장 높은 국가는?

2. covid 데이터에서 신규 확진자가 없는 나라 중 WHO Region이 'Europe'를 모두 출력하면?

3. 다음 데이터를 이용해 각 Region별로 아보카도가 가장 비싼 평균가격(AveragePrice)을 출력하면?

'Data > Pandas' 카테고리의 다른 글

'Data/Pandas'의 다른글

티스토리툴바

[AI class day12] 파이썬 판다스 python pandas TIL

2. 파이썬으로 데이터 주무르기, pandas

I. pandas 시작하기

II. pandas로 1차원 데이터 다루기 - Series

Series vs. Numpy

Series vs. dict

Series에 이름 붙이기

III. Pandas로 2차원 데이터 다루기 - dataframe

From CSV to DataFrame

Pandas 활용 1. 일부분만 관찰하기

Pandas 활용 2. 데이터 접근하기

Pandas 활용 3. "조건"을 이용해서 데이터 접근하기

Pandas 활용 4. row를 기준으로 데이터 접근하기

인덱스를 이용해서 가져오기 : .loc[row,col]

숫자 인덱스를 이용해서 가져오기 : .iloc[rowidx,colidx] (숫자)

Pandas 활용 5. groupby

Mission:

1. covid 데이터에서 100 case 대비 사망률(Deaths / 100 Cases)이 가장 높은 국가는?

2. covid 데이터에서 신규 확진자가 없는 나라 중 WHO Region이 'Europe'를 모두 출력하면?

3. 다음 데이터를 이용해 각 Region별로 아보카도가 가장 비싼 평균가격(AveragePrice)을 출력하면?

'Data > Pandas' 카테고리의 다른 글

'Data/Pandas'의 다른글

관련글

티스토리툴바

인덱스를 이용해서 가져오기 : `.loc[row,col]`

숫자 인덱스를 이용해서 가져오기 : `.iloc[rowidx,colidx]` (숫자)

1. covid 데이터에서 100 case 대비 사망률(`Deaths / 100 Cases`)이 가장 높은 국가는?