Leverage로서의 Feature Engineering

~30 min · features, engineering

Level 0Scout

0 XP0/48 lessons0/11 achievements

0/120 XP to next level120 XP to go0% complete

lift가 어디서 오나

Baseline 너머, feature engineering이 hyperparameter tuning 보다 시간당 더 많은 performance 사. trick은 prediction-time 제약 존중하는 feature를 engineer 하는 거. time window aggregate, ratio, lag, 도메인 특화 grouping이 raw field를 일관되게 능가.

패턴 카탈로그

Time-window aggregate — prediction 시점 기준 last 7/30/90일의 count/sum/mean.
Ratio — baseline 대비 값 ("이 user의 spend / 이 plan의 평균 spend").
Recency / frequency — last action 이후 일수, 기간 당 action 수.
Categorical interaction — 비즈니스 segment 표현 쌍 ("tier × region").
Cross-entity rollup — user 자체 말고 user의 organization에 대한 feature.

모든 feature에 leakage 체크

각 engineered feature에 대해, 그 시점 가용한 데이터로 prediction 시점에 계산 가능하다는 한 문장 증명 작성. 증명 어려우면 leakage야.

Code

pandas의 time-aware rolling window·python

df = df.sort_values(["user_id", "event_time"])
df["tickets_30d"] = (
    df.groupby("user_id")
      .rolling("30D", on="event_time")["ticket_count"]
      .sum()
      .reset_index(level=0, drop=True)
)

ratio와 recency feature·python

df["spend_vs_plan_avg"] = df["monthly_spend"] / df.groupby("plan_tier")["monthly_spend"].transform("median")
df["days_since_last_login"] = (df["prediction_time"] - df["last_login_at"]).dt.days

External links

Exercise

dataset에 engineered feature 3개 추가: rolling time-window aggregate 하나, ratio 하나, recency 하나. 각각에 prediction-time legality 한 문장 증명 작성. CV 다시 돌려서 lift 보고.

Progress

Progress is local-only — sign in to sync across devices.

← PreviousEnd-to-End Workflow Next →Pipeline Artifact

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.