Ccode/데이터 시각화

주가 분석

맨사설 2021. 8. 18. 13:25

728x90

- 주식을 시작한 지 어느덧 5개월 차에 접어든 나
- 일희일비를 겪으며 현재는 나만의 원칙(?)을 정하고 종목을 산다.

- 현재는 안정적인 수익을 내고 있으며 부모님 효도까지도 주식으로 하고 있다. ^^

- ~~(외국인, 기관, 연기금) 이 3박자가 맞는 주식을 산다. (누구나 아는 사실일 수도 있지만...)~~

- 그 원칙에 의해 웬만해서는 실패하지 않았기에 실제 데이터 분석을 통해 눈으로 확인해 보고자 한다.

◎ 주가 분석 프로젝트¶

연기금 매매 동향은 데이터 수집이 어려워
외국인과 기관의 매매 동향이 주가에 얼마나 영향을 미치는지 눈으로 확인해 보겠다.

● 웹 크롤링을 통해 데이터 수집하기¶

In [1]:

!pip install Selenium

Requirement already satisfied: Selenium in c:\work\envs\datascience\lib\site-packages (3.141.0)
Requirement already satisfied: urllib3 in c:\work\envs\datascience\lib\site-packages (from Selenium) (1.26.4)

In [2]:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

from bs4 import BeautifulSoup
import time
import pandas as pd
from selenium.common.exceptions import NoSuchElementException

In [3]:

def get_stock(code, page_num):
    chrome_driver = 'C:/dev_python/webdriver/chromedriver.exe'
    wd = webdriver.Chrome(chrome_driver)
    # 빈 리스트 생성하기
    gov_list=[]
    for_list=[]
    price_list=[]
    for page_no in range(1,page_num+1):
        url = 'https://finance.naver.com/item/frgn.nhn?code={}&page={}'.format(code, page_no)
        wd.get(url)
        page_ul = wd.find_element_by_tag_name('tbody')
        for i in range(4,33):
            govs = page_ul.find_elements_by_xpath('//*[@id="content"]/div[2]/table[1]/tbody/tr[{}]/td[6]/span' .format(i))
            gov_list += [gov.text for gov in govs] # 기관 매매
            fors = page_ul.find_elements_by_xpath('//*[@id="content"]/div[2]/table[1]/tbody/tr[{}]/td[7]/span' .format(i))
            for_list += [fore.text for fore in fors] # 외국인 매매
            prices = page_ul.find_elements_by_xpath('//*[@id="content"]/div[2]/table[1]/tbody/tr[{}]/td[4]/span' .format(i))
            price_list += [price.text for price in prices] # 그날의 주가 변동
            
    stock_df = pd.DataFrame({"Price ratio" : price_list,
                                   "Government" : gov_list,
                                   "Foreigner" : for_list})
    wd.close()
    stock_df.to_csv('C:/Users/설위준/Desktop/my room/data3/{}.csv' .format(code))

In [4]:

def stock_num(name):

    chrome_driver = 'C:/dev_python/webdriver/chromedriver.exe'
    wd = webdriver.Chrome(chrome_driver)
    url = 'https://finance.naver.com/'
    wd.get(url)
    button = wd.find_element_by_name("query")
    button.send_keys(name)
    button.send_keys(Keys.ENTER)
    time.sleep(2)
    num = wd.find_element_by_xpath('//*[@id="middle"]/div[1]/div[1]/div/span[1]') # 주식 코드 확인
    get_stock(num.text,10)
    wd.close()
    # 한 페이지에 20일의 정보가 있으므로 10를 설정해 최근 200일의 정보를 주식마다 들고 오겠다.

핫한 종목 위주로 분석해 보았다.

In [5]:

stock_num("HMM") # 011200
stock_num("SK바이오사이언스") # 302440
stock_num("SK아이이테크놀로지") # 361610
stock_num("후성") # 093370
stock_num("동국제강") # 001230
stock_num("NI스틸") # 008260
stock_num("두산중공업") # 034020
stock_num("한솔로지스틱스") # 009180
stock_num("코리안리") # 003690
stock_num("대우건설") # 047040
stock_num("대한전선") # 001440
stock_num("신성통상") # 005390
stock_num("만도") # 204320

○ 데이터 통합¶

In [6]:

# data 폴더에 있는 모든 csv 파일을 읽어오기 위해 glob을 사용합니다.
from glob import glob

# csv 목록 불러오기
file_names = glob("../my room/data3/*.csv")
total = pd.DataFrame()

# 모든 csv 병합하기
for file_name in file_names:
    temp = pd.read_csv(file_name,encoding='utf-8')
    total = pd.concat([total,temp])
# reset index로 인덱스를 새로 지정할 수 있다.
total.reset_index(inplace=True, drop=True) #기존 index 제거하고 싶을땐 drop=True

● 데이터 살펴보기 및 전처리¶

In [7]:

total = total.drop('Unnamed: 0',axis=1) # 필요없는 열 제거
total.head()

Out[7]:

	Price ratio	Government	Foreigner
0	-2.58%	-47,106	+150,147
1	-1.78%	-33,799	+123,645
2	-0.51%	-16,007	+104,535
3	+0.76%	-64,283	+126,244
4	-0.76%	-99,609	+27,962

In [8]:

total.info()
#object를 int로 변형할 필요성 있음

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2453 entries, 0 to 2452
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Price ratio  2453 non-null   object
 1   Government   2453 non-null   object
 2   Foreigner    2453 non-null   object
dtypes: object(3)
memory usage: 57.6+ KB

In [9]:

#object변수 변형
import numpy as np
total['Government'] = total['Government'].apply(lambda x : x.replace(',',''))
total['Government'] = total['Government'].astype(int)

In [10]:

#object변수 변형
total['Foreigner'] = total['Foreigner'].apply(lambda x : x.replace(',',''))
total['Foreigner'] = total['Foreigner'].astype(int)

In [11]:

#object변수 변형
total['Price ratio'] = total['Price ratio'].apply(lambda x : x.replace('%',''))
total['Price ratio']= total['Price ratio'].astype(float)

In [12]:

total.info() # 모두 수정했음을 확인할 수 있다

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2453 entries, 0 to 2452
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Price ratio  2453 non-null   float64
 1   Government   2453 non-null   int32  
 2   Foreigner    2453 non-null   int32  
dtypes: float64(1), int32(2)
memory usage: 38.5 KB

In [13]:

pd.set_option('display.float_format', '{:.2f}'.format)
total.describe()

Out[13]:

	Price ratio	Government	Foreigner
count	2453.00	2453.00	2453.00
mean	0.37	-45738.52	1246.52
std	4.32	736352.30	868949.62
min	-26.43	-23452601.00	-10816785.00
25%	-1.50	-66367.00	-84552.00
50%	0.00	-164.00	463.00
75%	1.65	41687.00	89210.00
max	30.00	2794981.00	27555434.00

● 데이터 분석하기¶

In [14]:

import matplotlib.pyplot as plt
import pandas as pd
from pandas import DataFrame
from pandas import Series
from pandas.plotting import scatter_matrix
import seaborn as sns

In [15]:

# matplotlib 한글 폰트 출력코드
import matplotlib
from matplotlib import font_manager, rc
import platform

try : 
    if platform.system() == 'Windows':
    # 윈도우인 경우
        font_name = font_manager.FontProperties(fname="c:/Windows/Fonts/malgun.ttf").get_name()
        rc('font', family=font_name)
    else:    
    # Mac 인 경우
        rc('font', family='AppleGothic')
except : 
    pass
matplotlib.rcParams['axes.unicode_minus'] = False

In [16]:

total.head()

Out[16]:

	Price ratio	Government	Foreigner
0	-2.58	-47106	150147
1	-1.78	-33799	123645
2	-0.51	-16007	104535
3	0.76	-64283	126244
4	-0.76	-99609	27962

In [17]:

# 변수들간 시각화
sns.pairplot(total,kind='hist')
plt.show()

In [18]:

total.corr(method='pearson')

Out[18]:

	Price ratio	Government	Foreigner
Price ratio	1.00	0.16	0.15
Government	0.16	1.00	-0.15
Foreigner	0.15	-0.15	1.00

In [19]:

plt.rcParams["figure.figsize"] = (5,5)
sns.heatmap(total.corr(),
           annot = True, #실제 값 화면에 나타내기
           cmap = 'bone', #색상
           vmin = -1, vmax=1)
plt.title('Stock Heatmap')

Out[19]:

Text(0.5, 1.0, 'Stock Heatmap')

기관 매매와 외국인 매매 둘 다 가격에 미미한 영향을 미침을 알 수 있다.

하지만 위 자료는 여러 종목에서의 매매 동향을 수집한 자료이다.

따라서 종목마다 매매 수량이 다르므로 양의 값에는 1을 음의 값에는 0을 설정하여 통일된 값을 주고 다시 분석해 보겠다.

○ 3 변수 모두 0과 1로 변경해서 분석을 다시 해 보겠다.¶

In [20]:

total1 = total[['Price ratio','Government','Foreigner']]
total1['Price ratio'] = [1 if s > 0 else 0 for s in total1['Price ratio']]
total1['Government'] = [1 if s > 0 else 0 for s in total1['Government']]
total1['Foreigner'] = [1 if s > 0 else 0 for s in total1['Foreigner']]
total1.head()

Out[20]:

	Price ratio	Foreigner
0	0	1
1	0	1
2	0	1
3	1	1
4	0	1

In [21]:

total1.corr(method='pearson')

Out[21]:

	Price ratio	Government	Foreigner
Price ratio	1.00	0.24	0.28
Government	0.24	1.00	0.04
Foreigner	0.28	0.04	1.00

In [22]:

plt.rcParams["figure.figsize"] = (5,5)
sns.heatmap(total1.corr(),
           annot = True, #실제 값 화면에 나타내기
           cmap = 'coolwarm', #색상
           vmin = -1, vmax=1)
plt.title('Stock Heatmap')

Out[22]:

Text(0.5, 1.0, 'Stock Heatmap')

변수를 단순하게 통일해서 분석해 보니 이전의 (0.16, 0.15)의 값보다 현재 (0.24, 0.28)으로 영향력이 더 커졌음을 알 수 있다.

확실히 기관과 외국인의 매매 동향은 그 주식의 주가에 영향을 끼침을 알 수 있다.

○ 외국인 + 기관이 모두 산 날에의 주가¶

In [23]:

total2 = total1[(total1['Government']==1)&(total1['Foreigner']==1)]
total2.head()

Out[23]:

	Price ratio	Government	Foreigner
7	1	1	1
8	1	1	1
11	1	1	1
14	1	1	1
16	1	1	1

In [24]:

total2.describe()

Out[24]:

	Price ratio	Government	Foreigner
count	603.00	603.00	603.00
mean	0.76	1.00	1.00
std	0.43	0.00	0.00
min	0.00	1.00	1.00
25%	1.00	1.00	1.00
50%	1.00	1.00	1.00
75%	1.00	1.00	1.00
max	1.00	1.00	1.00

외국인 + 기관 모두 그 주식을 샀을 때는 주가가 오를 확률이 무려 76%임을 알 수 있다!!

○ 외국인 + 기관이 모두 판 날에의 주가¶

In [25]:

total2 = total1[(total1['Government']==0)&(total1['Foreigner']==0)]
total2.head()

Out[25]:

	Price ratio	Government	Foreigner
5	0	0	0
6	0	0	0
9	0	0	0
12	0	0	0
20	0	0	0

In [26]:

total2.describe()

Out[26]:

	Price ratio	Government	Foreigner
count	678.00	678.00	678.00
mean	0.26	0.00	0.00
std	0.44	0.00	0.00
min	0.00	0.00	0.00
25%	0.00	0.00	0.00
50%	0.00	0.00	0.00
75%	1.00	0.00	0.00
max	1.00	0.00	0.00

외국인 + 기관 모두 그 주식을 팔았을 때는 주가가 오를 확률은 26%임을 알 수 있다!!

주가가 음의 값을 0으로 두었기 때문에 반대로 말하면 주가가 내려갈 확률은 74%임을 알 수 있다.

" 결론 : 외국인 + 기관 모두 주식을 사면 나도 따라 사자!! "

In [27]:

from IPython.core.display import display, HTML
display(HTML("<style>.container {width:80% !important;}</style>"))

728x90

저작자표시 비영리 변경금지

'Ccode > 데이터 시각화' 카테고리의 다른 글

Python_folium을 활용한 지도 시각화 (0)	2022.09.16
데이터 시각화 Tableau 로 시작하기 (0)	2021.11.14
신용카드 사용자 연체 예측_EDA(2) (0)	2021.08.17
신용카드 사용자 연체 예측_EDA(1) (0)	2021.08.13
Netflix 데이터 간략히 EDA 해보기 (0)	2021.08.05

현재글주가 분석

#wannabeeeeeee the best DataScientist

주가 분석

◎ 주가 분석 프로젝트¶

● 웹 크롤링을 통해 데이터 수집하기¶

○ 데이터 통합¶

● 데이터 살펴보기 및 전처리¶

● 데이터 분석하기¶

○ 3 변수 모두 0과 1로 변경해서 분석을 다시 해 보겠다.¶

○ 외국인 + 기관이 모두 산 날에의 주가¶

○ 외국인 + 기관이 모두 판 날에의 주가¶

'Ccode > 데이터 시각화' 카테고리의 다른 글

'Ccode/데이터 시각화'의 다른글

티스토리툴바

주가 분석

◎ 주가 분석 프로젝트¶

● 웹 크롤링을 통해 데이터 수집하기¶

○ 데이터 통합¶

● 데이터 살펴보기 및 전처리¶

● 데이터 분석하기¶

○ 3 변수 모두 0과 1로 변경해서 분석을 다시 해 보겠다.¶

○ 외국인 + 기관이 모두 산 날에의 주가¶

○ 외국인 + 기관이 모두 판 날에의 주가¶

'Ccode > 데이터 시각화' 카테고리의 다른 글

'Ccode/데이터 시각화'의 다른글

관련글

티스토리툴바