re — Python 의 정규식

re 모듈 — Python 의 regex 인터페이스

regex 는 언어 안의 별 언어. re 모듈이 인터페이스. 가장 많이 쓰는 4 함수 — re.search 가 문자열 어디든 첫 매치 찾음. re.match 가 시작에서만 매치. re.findall 이 모든 매치 list 로 반환. re.sub 가 매치 교체.

Compile 할까 말까 — 성능 질문

같은 패턴 여러 번 쓰면 re.compile(pattern) 한 번 + 컴파일된 객체에 .search/.findall 호출이 빠름. Python 이 작은 한도까지 컴파일 패턴 자동 캐시, hot path 엔 명시적이 더 예측 가능. 일회용엔 모듈 레벨 함수 완벽 OK.

가장 많이 쓰는 패턴

\d 숫자, \w 단어 글자 (글자/숫자/언더스코어), \s 공백. + 1 이상, * 0 이상, ? 0 또는 1, {n,m} n 에서 m. ^ 시작, $ 끝. () capture 그룹. (?P<name>...) 이름 그룹. (?:...) non-capturing 그룹. | 대안. [abc] 글자 클래스.

Raw 문자열 — 어디서나 r-prefix

regex 는 backslash 많이 사용. Python 문자열이 backslash 시퀀스 (\n, \t) 해석. 이중 escape 피하려면 raw 문자열 — r"\d+", "\\d+" X. 습관으로 — 코드의 모든 regex 리터럴이 r 로 시작.

War Story: regex 로 HTML, JSON, 또는 nested 구조 가진 거 파싱 X. regex 가 괄호 정확히 카운트 못 해. 진짜 파서 써. regex 는 평탄한 패턴에 빛 — 전화번호, 로그 줄, 고정 포맷 텍스트. 잘하는 거에 기대, 아닐 때 다른 데로.

Code

search / match / findall / sub — 4·python

import re

text = "Pippa is 4 years old. Dad is 50. Pippa loves coding."

# search — 어디든 첫 매치
m = re.search(r"\d+", text)
print(m.group())              # '4'
print(m.span())               # (9, 10) — start, end

# match — 시작에서만 (여기선 None)
print(re.match(r"\d+", text))    # None
print(re.match(r"Pippa", text))  # match 객체

# findall — 모든 매치
print(re.findall(r"\d+", text))     # ['4', '50']
print(re.findall(r"Pippa", text))   # ['Pippa', 'Pippa']

# sub — 교체
print(re.sub(r"\d+", "AGE", text))
# 'Pippa is AGE years old. Dad is AGE. Pippa loves coding.'

Capture 그룹 — 부분 추출·python

import re

log_line = "2026-05-02 15:30:42 [ERROR] Database connection failed"

# 번호 그룹
m = re.match(r"(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) \[(\w+)\] (.+)", log_line)
if m:
    print(m.group(1))         # '2026-05-02'
    print(m.group(2))         # '15:30:42'
    print(m.group(3))         # 'ERROR'
    print(m.group(4))         # 'Database connection failed'
    print(m.groups())         # 모든 그룹 tuple

# 이름 그룹 — 더 읽기 좋음
m = re.match(
    r"(?P<date>\d{4}-\d{2}-\d{2}) (?P<time>\d{2}:\d{2}:\d{2}) \[(?P<level>\w+)\] (?P<msg>.+)",
    log_line
)
if m:
    print(m.group("date"))    # '2026-05-02'
    print(m.group("level"))   # 'ERROR'
    print(m.groupdict())      # 이름 그룹 dict

Compile — 반복 사용용·python

import re

# 한 번 컴파일
email_re = re.compile(r"[\w.+-]+@[\w-]+\.[\w.-]+")

emails = ["alice@example.com", "hello world", "bob+tag@x.org"]
for email in emails:
    if email_re.match(email):
        print("유효:", email)
    else:
        print("이메일 아님:", email)

# 컴파일 플래그
case_insensitive = re.compile(r"pippa", re.IGNORECASE)
print(case_insensitive.findall("Pippa, PIPPA, pippa, pippA"))
# ['Pippa', 'PIPPA', 'pippa', 'pippA']

sub + callable — 동적 교체·python

import re

text = "Pippa is 4 years old. Dad is 50."

# 각 숫자를 두 배로
def double_age(match):
    age = int(match.group())
    return str(age * 2)

print(re.sub(r"\d+", double_age, text))
# 'Pippa is 8 years old. Dad is 100.'

# Backreference — 교체에서 캡쳐된 그룹 참조
print(re.sub(r"(\w+)@(\w+)", r"\2/\1", "alice@example"))
# 'example/alice'

흔한 함정 — greedy 매칭, escape·python

import re

# 디폴트 greedy
text = '<b>hello</b> <i>world</i>'
print(re.findall(r"<(.+)>", text))      # ['b>hello</b> <i>world</i']
# +? 가 non-greedy
print(re.findall(r"<(.+?)>", text))     # ['b', '/b', 'i', '/i']

# 패턴의 특수 글자는 escape 필요
text = "What is 1+1? It's 2."
print(re.findall(r"\?", text))          # ['?']
print(re.findall(r"\.", text))          # ['.']

# re.escape — regex 에 쓸 리터럴 문자열 escape
lit = "3.14 (special)"
pattern = re.escape(lit)
print(pattern)                          # '3\\.14\\ \\(special\\)'

Exercise

가짜 서버 로그 항목 멀티라인 문자열 (예 — "[2026-05-02 15:30:42] [ERROR] Login failed for alice@example.com from 192.168.1.5"). 이름 그룹 가진 단일 컴파일 regex 로 추출 — 날짜, 시간, level, message, 이메일 (있으면), IP 주소 (있으면). 최소 3 샘플 줄에 실행 + 각 이름 그룹 dict 출력.

re — Python 의 정규식

re 모듈 — Python 의 regex 인터페이스

Compile 할까 말까 — 성능 질문

가장 많이 쓰는 패턴

Raw 문자열 — 어디서나 r-prefix

Code

External links

Exercise

Progress

댓글 0