Data scientist/Machine Learning

의사결정나무 + Python_Code

맨사설 2021. 8. 26. 17:06

728x90

◎ 수학적 개념

○ 엔트로피 (Entropy) : 분류가 되어 있지 않으면 엔트로피는 높은 값을 가지며 분류가 잘 되어있으면 엔트로피 값은 낮다.

○ Information Gain : Entropy(before) - Entropy(after)

Decision Tree의 특정 node 이전과 이후의 Entropy 차이

○ Classification Tree

○ Regression Tree

◎ 의사결정나무

변수들로 기준을 만들고 이것을 통하여 샘플을 분류하고 분류된 집단의 성질을 통하여 추정하는 모형
장점 : 해석력이 높음, 직관적, 범용성
단점 : 높은 변동성, 샘플에 민감할 수 있다.

※ 의사결정나무 용어

ⓐ Node - 분류의 기준이 되는 변수가 위치. 이것을 기준으로 샘플을 나눔.

- Parent node : 상위 노드

- Child node : 하위 노드

- Root node : 상위 노드가 없는 가장 위의 노드

- Leaf node (Tip) : 하위 노드가 없는 가장 아래의 노드

- Internal node : Leaf node가 아닌 노드

ⓑ Edge - 샘플을 분류하는 조건이 위치하는 곳

ⓒ Depth - Root node에서 특정 노드까지 도달하기 위해 거쳐야 하는 Edge의 수

12. Decision+Tree_before_real

Decision Tree 실습¶

1. 함수 익히기 및 모듈 불러오기¶

함수 익히기

In [1]:

from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

In [2]:

clf.predict([[1, 1]])

Out[2]:

array([1])

모듈 불러오기

In [3]:

from sklearn.datasets import load_iris
from sklearn import tree
from os import system

In [4]:

import os
os.environ["PATH"]+=os.pathsep+'C:/Program Files (x86)/Graphviz/bin/'

In [5]:

!pip install graphviz

Requirement already satisfied: graphviz in c:\work\envs\datascience\lib\site-packages (0.17)

In [6]:

import graphviz

데이터 로드

In [7]:

iris=load_iris()

In [10]:

iris.feature_names

Out[10]:

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [11]:

iris.target

Out[11]:

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [12]:

iris.target_names

Out[12]:

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

2. 의사결정나무 구축 및 시각화¶

트리 구축

In [13]:

clf=tree.DecisionTreeClassifier()
clf=clf.fit(iris.data,iris.target)

트리의 시각화

In [14]:

dot_data=tree.export_graphviz(clf,out_file=None, 
                              feature_names=iris.feature_names,
                             class_names=iris.target_names,
                             filled=True,rounded=True,
                             special_characters=True)
graph=graphviz.Source(dot_data)

In [15]:

graph

Out[15]:

엔트로피를 활용한 트리

In [16]:

clf2=tree.DecisionTreeClassifier(criterion='entropy')

In [17]:

clf2.fit(iris.data,iris.target)

Out[17]:

DecisionTreeClassifier(criterion='entropy')

In [18]:

dot_data2=tree.export_graphviz(clf2,out_file=None, 
                              feature_names=iris.feature_names,
                             class_names=iris.target_names,
                             filled=True,rounded=True,
                             special_characters=True)
graph2=graphviz.Source(dot_data2)

In [19]:

graph2

Out[19]:

프루닝

In [20]:

clf3=tree.DecisionTreeClassifier(criterion='entropy',max_depth=2)

In [21]:

clf3.fit(iris.data,iris.target)

Out[21]:

DecisionTreeClassifier(criterion='entropy', max_depth=2)

In [22]:

dot_data3=tree.export_graphviz(clf3,out_file=None, 
                              feature_names=iris.feature_names,
                             class_names=iris.target_names,
                             filled=True,rounded=True,
                             special_characters=True)
graph3=graphviz.Source(dot_data3)
graph3 # 적당한 선에서 끊는것을 프루닝이라고 한다.

Out[22]:

Confusion Matrix 구하기

In [23]:

from sklearn.metrics import confusion_matrix
confusion_matrix(iris.target,clf.predict(iris.data))

Out[23]:

array([[50,  0,  0],
       [ 0, 50,  0],
       [ 0,  0, 50]], dtype=int64)

In [24]:

confusion_matrix(iris.target,clf2.predict(iris.data))

Out[24]:

array([[50,  0,  0],
       [ 0, 50,  0],
       [ 0,  0, 50]], dtype=int64)

In [25]:

confusion_matrix(iris.target,clf3.predict(iris.data)) 
# 프루닝같은 경우 틀린경우 발생

Out[25]:

array([[50,  0,  0],
       [ 0, 49,  1],
       [ 0,  5, 45]], dtype=int64)

3. Training - Test 구분 및 Confusion matrix 계산¶

In [26]:

from sklearn.model_selection import train_test_split

In [27]:

X_train, X_test, y_train, y_test = train_test_split(iris.data,iris.target,
                                                   stratify=iris.target,
                                                   random_state=1)

In [28]:

clf4=tree.DecisionTreeClassifier(criterion="entropy")

In [29]:

clf4.fit(X_train,y_train)

Out[29]:

DecisionTreeClassifier(criterion='entropy')

In [30]:

confusion_matrix(y_test,clf4.predict(X_test))

Out[30]:

array([[12,  0,  0],
       [ 0, 13,  0],
       [ 0,  1, 12]], dtype=int64)

4. Decision regression tree¶

모듈 불러오기 및 데이터 생성

In [31]:

import numpy as np
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt

rng = np.random.RandomState(1)
X = np.sort(5 * rng.rand(80, 1), axis=0)
y = np.sin(X).ravel()
y[::5] += 3 * (0.5 - rng.rand(16))

Regression tree 구축

In [32]:

regr1=tree.DecisionTreeRegressor(max_depth=2)
regr2=tree.DecisionTreeRegressor(max_depth=5)

In [33]:

regr1.fit(X,y)

Out[33]:

DecisionTreeRegressor(max_depth=2)

In [34]:

regr2.fit(X,y)

Out[34]:

DecisionTreeRegressor(max_depth=5)

In [35]:

X_test=np.arange(0.0,5.0,0.01)[:,np.newaxis]

In [36]:

# 예측
y_1=regr1.predict(X_test)
y_2=regr2.predict(X_test)

In [37]:

# 깊이에 따른 예측의 변화
plt.figure()
plt.scatter(X, y, s=20, edgecolor="black",
            c="darkorange", label="data")
plt.plot(X_test, y_1, color="cornflowerblue",
         label="max_depth=2", linewidth=2)
plt.plot(X_test, y_2, color="yellowgreen", label="max_depth=5", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()

In [38]:

dot_data4 = tree.export_graphviz(regr2, out_file=None, 
                                filled=True, rounded=True,  
                                special_characters=True)

In [39]:

graph4 = graphviz.Source(dot_data4) 
graph4

Out[39]:

In [40]:

dot_data5 = tree.export_graphviz(regr1, out_file=None, 
                                filled=True, rounded=True,  
                                special_characters=True)

In [41]:

graph5 = graphviz.Source(dot_data5) 
graph5 # depth가 2인 graph

Out[41]:

In [42]:

from IPython.core.display import display, HTML
display(HTML("<style>.container {width:80% !important;}</style>"))

728x90

저작자표시 비영리 변경금지

'Data scientist > Machine Learning' 카테고리의 다른 글

신경망 모형 + Python_Code (0)	2021.08.27
SVM + Python_Code (0)	2021.08.26
LDA + Python_Code (0)	2021.08.25
K-NN + Python_Code (0)	2021.08.25
Naive Bayes + Python_Code (0)	2021.08.25

현재글의사결정나무 + Python_Code

#wannabeeeeeee the best DataScientist

의사결정나무 + Python_Code

◎ 수학적 개념

◎ 의사결정나무

Decision Tree 실습¶

1. 함수 익히기 및 모듈 불러오기¶

2. 의사결정나무 구축 및 시각화¶

3. Training - Test 구분 및 Confusion matrix 계산¶

4. Decision regression tree¶

'Data scientist > Machine Learning' 카테고리의 다른 글

'Data scientist/Machine Learning'의 다른글

티스토리툴바

의사결정나무 + Python_Code

◎ 수학적 개념

◎ 의사결정나무

Decision Tree 실습¶

1. 함수 익히기 및 모듈 불러오기¶

2. 의사결정나무 구축 및 시각화¶

3. Training - Test 구분 및 Confusion matrix 계산¶

4. Decision regression tree¶

'Data scientist > Machine Learning' 카테고리의 다른 글

'Data scientist/Machine Learning'의 다른글

관련글

티스토리툴바