Decision Tree

~28 min · trees, interpretable

Level 0Scout

0 XP0/48 lessons0/11 achievements

0/120 XP to next level120 XP to go0% complete

tree가 어떻게 학습하나

Decision tree가 impurity criterion(classification은 Gini, regression은 MSE) 최소화하려고 feature 값으로 데이터 recursive하게 split. 각 leaf가 majority label이나 mean target 받음. tree가 missingness, mixed type, monotone과 non-monotone 관계, interaction을 native로 다룸 — 그래서 tabular 데이터에서 weight 위로 펀치.

강점과 약점

Strength: 검사 쉬움. 한 tree를 화이트보드에 그려서 product review 가능.
Strength: scaling X, encoding 고통 X, normality 가정 X.
Weakness: high depth에서 단일 tree가 공격적으로 overfit.
Weakness: 작은 데이터 perturbation이 wildly 다른 tree 생성.

사용법

shallow tree(depth 3-5)을 inspectable baseline과 stakeholder 설명 도구로. prediction 품질엔 거의 항상 random forest나 gradient boosting으로 졸업. ensembling이 bias-variance trade-off 해결.

Code

shallow tree train과 visualize·python

from sklearn.tree import DecisionTreeClassifier, export_text

tree = DecisionTreeClassifier(max_depth=4, class_weight="balanced", random_state=7)
tree.fit(X_train, y_train)
print(export_text(tree, feature_names=list(X_train.columns)))

노트북에서 tree plot·python

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

plt.figure(figsize=(14, 7))
plot_tree(tree, filled=True, feature_names=list(X_train.columns), max_depth=3)
plt.show()

External links

Exercise

dataset에 max_depth=4 decision tree train. export_text로 출력. 비기술 stakeholder에게 보여주고 split이 도메인 직관과 매치되는지 물어. surprise는 조사할 데이터 가설로 메모.

Progress

Progress is local-only — sign in to sync across devices.

← PreviousBias-Variance Tradeoff Next →Random Forest

이 페이지에서 버그를 발견하셨거나 피드백이 있으세요?문제 신고

🔔 답글 알림 (로그인 필요)

로그인 — 댓글을 남기려면 로그인해 주세요.

아직 댓글이 없어요. 첫 댓글을 남겨보세요.