Pandas DataFrame 과 Series — Labeled 테이블

NumPy 위에 Pandas 가 더하는 것

Pandas 는 Python 의 tabular 데이터 manipulation 표준 라이브러리. 현 stable: Pandas 3.0.2 (2026.3). 2026.1 의 3.0 릴리스가 큰 milestone — Copy-on-Write 가 default 되고, PyArrow-backed string type 이 mainstream 되고, legacy API 가 길게 제거됐어.

NumPy 가 typed numeric 배열을 주는 데 비해 Pandas 는 두 가지를 더해: label (named column + indexed row) 와 heterogeneous type (string, datetime, categorical 한 테이블에). 지루해 보이는데 — 실전에선 "raw 숫자" 와 "실제로 작업할 수 있는 데이터" 의 차이야.

두 핵심 type

Series — 단일 labeled column. 개념적으론 NumPy 1D 배열 + index. 모든 DataFrame 컬럼이 Series.
DataFrame — 2D labeled 테이블. Column 마다 다른 type 가능, row 는 index 공유.

Label 이 왜 중요해?

매출이 7번 column 인 거 외우고 싶지 않잖아. df['revenue'] 쓰고 싶지. Label 은 또 DataFrame 간 연산을 자동 align — 두 DataFrame 더하면 매칭 index/column label 이 줄 맞고, 안 맞는 건 silent bug 가 아니라 NaN 으로 나타나.

Code

흔한 source 들에서 DataFrame 과 Series 만들기·python

import pandas as pd

# 1. dict-of-list 에서 (수동 구성 시 가장 흔함)
df = pd.DataFrame({
    'order_id':   ['A001', 'A002', 'A003'],
    'amount_usd': [120.50, 87.30, 215.00],
    'status':     ['shipped', 'pending', 'shipped'],
})

# 2. 단일 column 은 Series
amounts = df['amount_usd']            # Series
print(type(amounts).__name__)          # Series

# 3. 구조 살펴보기
print(df.shape)        # (3, 3)
print(df.dtypes)       # column type 들
print(df.head())       # 첫 row 들
print(df.describe())   # 수치 요약

# 4. 단일 값 select (label 기반)
print(df.loc[0, 'amount_usd'])         # 120.50

# 5. Parquet 파일에서 (현실 진입점)
# df = pd.read_parquet('orders.parquet')

# 6. Pandas 3.x 에선 PyArrow-backed string 이 더 빠르고 작음
df['status'] = df['status'].astype('string[pyarrow]')

Pandas DataFrame 과 Series — Labeled 테이블

NumPy 위에 Pandas 가 더하는 것

두 핵심 type

Label 이 왜 중요해?

Code

External links

Exercise

Progress

댓글 0