나만의 로보 어드바이저(robo-advisor) 만들기 [2]

금융 2016. 8. 30. 15:37

Python을 이용한 주가 데이터 크롤링

 

기본적으로 로보 어드바이저는 대용량의 주가 및 환경 데이터 + 지능적 트레이딩 알고리즘에 의해 구동된다. 따라서 나만의 로보 어드바이저를 만들기 위해서는 나만의 트레이딩 알고리즘을 구현할 수 있는 프로그램 능력과 대용량 데이터를 분석할 수 있는 분석 능력뿐 아니라 데용량 데이터를 수집할 수 있는 수집 능력이 필수 적이다. 따라서 주가 데이터의 크롤러가 필수적이다. 이 글에서는 주가 데이터의 크롤러 제작에 필요한 기술과 구현 방법에 대해 살펴보겠다.

 

공학도라면 서론을 길게 말하는것을 싫어한다. 결론 먼저, 설명 나중이므로. 먼저 쿨하게 내가 제작한 주가 데이터 크롤러 소스를 밑에 붙여 놓았다. 바쁜 사람은 그냥 카피해가서 바로 사용하면 된다. 안 바쁜 독자라면 계속 읽어보자.

 

이 코드는 파이썬으로 제작한 주가 데이터 크롤러이다. 로보 어드바이저는 두가지 크롤러가 필요한데, 롱텀 크롤러와 실시간 크롤러가 그것이다. 롱텀 크롤러는 처음 로보 어드바이저를 기동하기 위해 최초 한번은 몇 년치 주가 데이터를 한꺼번에 크롤링 하는 작업이 필요한데 그때 사용하기 위한 크롤러 이다. 밑에 소스는 롱텀 크롤러 소스이다. 이 후에는 변경(추가)된 데이터만을 매일매일, 혹은 실시간으로, 가져오는 간략화된(속도가 빠른) 실시간 크롤러가 필요한데 그것은 나중에 올리기로 한다. 물론 롱텀 그롤러와 실시간 크롤러는 비슷한 코드로 구성된다. 단지 용도에 맞게 프로그램 흐름도가 조금 다를뿐이다.

 

이 크롤러는 CybosPlus와 Python으로 구현되었다. CybosPlus는 대신증권에서 제공하는 시스템 트레이딩을 위한 API이다. CybosPlus에 대한 정보는 쉽게 찾아 볼 수 있으므로 생략하기로 한다.

 

(임시저장...)

 

 

 

 

#
# Developed by "Dr.Deeeep"
#
# intelligent version of crawler, in this version
#   - it check the last day of item in the database and crawling from the last day.
#   - the data of last day will be overwritten because they may are not the final value
#

 

import time
import mysql.connector
import win32com.client

config = {
  'user': 'USERID',
  'password': 'PASSWORD',
  'host': 'yourhost.com',
  'database': 'DBNAME',
  'raise_on_warnings': True,
}
# Don't forget to change the server setting to arrow remote connection to MySQL.
# mysql examples are available at
# https://dev.mysql.com/doc/connector-python/en/connector-python-examples.html
cnx = mysql.connector.connect(**config)
cursor = cnx.cursor()

 

cpcybos = win32com.client.Dispatch("CpUtil.CpCybos")

if cpcybos.IsConnect == 1:
    print("Cybos connection succeeded")
else:
    print("Cybos connection failed")

   
inStockCode = win32com.client.Dispatch("CpUtil.CpStockCode")    # general stock item info
inStockMst = win32com.client.Dispatch("dscbo1.StockMst")        # Current data
inStockChart = win32com.client.Dispatch("CpSysdib.StockChart")  # Past Data

totalItemNumInCybosPlus = inStockCode.GetCount()
print("Total number of items = ",totalItemNumInCybosPlus)

# print all items in BT format
# Some variables, such as changerate, fortramt, totaltraceprice, should be calculated on the fly, thus filled with dummy '-999.0'

 

#itemcode
finalpriceArray = []        #inStockChart.SetInputValue(5, 5)
startpriceArray = []        #inStockChart.SetInputValue(5, 2)
highpriceArray = []         #inStockChart.SetInputValue(5, 3)
lowpriceArray = []          #inStockChart.SetInputValue(5, 4)
changerateArray = []         #inStockChart.SetInputValue(5, 6)       may need to calculate from (today final / yesterday final)
tramtArray = []             #inStockChart.SetInputValue(5, 8)
orgtramtArray = []          #inStockChart.SetInputValue(5, 20)      THIS data is not complete in cybosdb. cannot use..
fortramtArray = []          # -999.0 #inStockChart.SetInputValue(5, ???) may need to cal from (today fortotal - yesterday fortotal)
fortotalArray = []          #inStockChart.SetInputValue(5, 16)
forportionArray = []        #inStockChart.SetInputValue(5, 17)
totaltradepriceArray = []   # -999.0    #inStockChart.SetInputValue(5, 0) may need to cal from (tramt * final price)

#beyond BT Format
circulationrateArray = []   #inStockChart.SetInputValue(5, 25)
totalstocknumArray = []     #inStockChart.SetInputValue(5, 12)
sichongArray = []           #inStockChart.SetInputValue(5, 13)


###################################################
#
#   This number represents the number of record to crawl for each cybosplus access
#   Must be large enough to cover the gap between the last date in my db and today while keep small enough to prevent the waste of speed of cybosplus access.
numOfCrawling = 200        # n days
#
#   This string number is the starting itemcode for the case of resuming suspened, quitted, crawling job
startingItemCode = "000000"    # 000000 for new start
#
#
###################################################


currentItemIndex = -1
for aItemcode in range(0, inStockCode.GetCount()):   # for each itemcode
    currentItemIndex = currentItemIndex + 1
    print("### progress = ", ((currentItemIndex * 100) / totalItemNumInCybosPlus), " %")
                     
    dateToCrawlArray = []       #inStockChart.SetInputValue(5, 0)

    # this line print just the item code and name
    # print(inStockCode.GetData(0,i), inStockCode.GetData(1,i), inStockCode.GetData(2,i))

    itemcode = inStockCode.GetData(0,aItemcode)

    # this can be used for resuming suspended process
    # THIS may not necessary for intelligent version of crawler
    if itemcode < "A"+startingItemCode:
        print("Skipping "+itemcode)
        continue

    inStockChart.SetInputValue(0, itemcode)

    ## retrieve LAST DATE in DB of this itemcode ==========================
    ## if no data for this item, set the last date to 20050101
   
    print ("itemcode = ", itemcode)
    shortItemcode = itemcode[1:]

    sqlStr = ("select max(date) from datatable where itemcode='"+ shortItemcode+ "'")

    #print("sqlStr = ", sqlStr, ", newItemcode = ", shortItemcode)
    cursor.execute(sqlStr)

 

    #######################################
    # modified source
    row = cursor.fetchone()
    if row[0]:
        lastDateInMyDB = row[0]
    else:                           # Null. number of returned row is 0. no data in DB.
        print ("itemcode = ", itemcode, " is added newly... skipping..")
        print ("TODO:  add code here to handle newly added items. !!!!!!!!!!!!")
        continue
    #######################################

 


    ## retrieving LAST DATE in CybosPlus ========================
    inStockChart.SetInputValue(1, '2')  # retrieve by num of records
    inStockChart.SetInputValue(4, 1)    # num of record to retrive is. Just the last one
    inStockChart.SetInputValue(5, 0)    # field type, tr amt(8), finalprice(5), date(0)
    inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
    inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
    inStockChart.BlockRequest()
    lastDateInCybosPlus = inStockChart.GetDataValue(0,0)
    print("in itemcode = ", itemcode, ", The latest data in CybosPlus = ", lastDateInCybosPlus)


#TODO:
#    dateToCrawlArray = retieve DATE array from lastDateInMyDB to lastDateInCybosPlus

#    for aDate in dateArray:
#        do the same with other crawlers


    ## retireve dateToCrawlArray
    print("in itemcode = ", itemcode, ", Now retrieving Date2Crawl array...")
    inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
    inStockChart.SetInputValue(3, lastDateInMyDB) # starting date
    inStockChart.SetInputValue(2, lastDateInCybosPlus) # ending date
    inStockChart.SetInputValue(4, numOfCrawling) # get some data which number is given
    inStockChart.SetInputValue(5, 0)    # field type, tr amt(8), finalprice(5), date(0)
    inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
    inStockChart.BlockRequest()

    numofdate2Crawl = inStockChart.GetHeaderValue(3)    # num of rec retrieved
    print("in itemcode = ", itemcode, ", Num of DATE to crwal = ", numofdate2Crawl, " From = ", lastDateInMyDB, " to = ", lastDateInCybosPlus)
    for i in range(numofdate2Crawl):
        volumn = inStockChart.GetDataValue(0,i)
        dateToCrawlArray.insert(0, volumn)
        if int(volumn) <= int(lastDateInMyDB):
            break

    print("in itemcode = ", itemcode, ", dateToCrawArray = ", dateToCrawlArray)

    for aCrawlDate in dateToCrawlArray:
       
        ## retrieving FINALPRICE =========================
        inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
        inStockChart.SetInputValue(3, aCrawlDate) # starting date
        inStockChart.SetInputValue(2, aCrawlDate) # ending date
        inStockChart.SetInputValue(4, 1) # get a data
        inStockChart.SetInputValue(5, 5)    # finalprice(5)
        inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
        inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
        inStockChart.BlockRequest()
        numofreturned_finalprice = inStockChart.GetHeaderValue(3)    # num of rec retrieved
        #print(" Num of retrieved record (FP) = ", numofreturned_finalprice)
        if (numofreturned_finalprice > 0):
            finalprice = inStockChart.GetDataValue(0,0)  
            #print("     final price was ", finalprice)
        else:
            finalprice = '-999.0'
            print(itemcode, "     No data. default final price: -999.0")


        ## retrieving STARTPRICE ============================
        inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
        inStockChart.SetInputValue(3, aCrawlDate) # starting date
        inStockChart.SetInputValue(2, aCrawlDate) # ending date
        inStockChart.SetInputValue(4, 1) # get a data
        inStockChart.SetInputValue(5, 2)    # startprice
        inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
        inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
        inStockChart.BlockRequest()
        numofreturned_startprice = inStockChart.GetHeaderValue(3)    # num of rec retrieved
        #print(" Num of retrieved record (SP) = ", numofreturned_startprice)
        if (numofreturned_startprice > 0):
            startprice = inStockChart.GetDataValue(0,0)      
            #print("    start price was ", startprice)
        else:
            startprice = '-999.0'
            print(itemcode, "              No data. default START PRICE : -999.0")


        ## retrieving HIGHPRICE ============================
        inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
        inStockChart.SetInputValue(3, aCrawlDate) # starting date
        inStockChart.SetInputValue(2, aCrawlDate) # ending date
        inStockChart.SetInputValue(4, 1) # get a data
        inStockChart.SetInputValue(5, 3)    # highprice
        inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
        inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
        inStockChart.BlockRequest()
        numofreturned_highprice = inStockChart.GetHeaderValue(3)    # num of rec retrieved
        #print(" Num of retrieved record (HP) = ", numofreturned_highprice)
        if (numofreturned_highprice > 0):
            highprice = inStockChart.GetDataValue(0,0)      
            #print("    high price was ", highprice)
        else:
            highprice = '-999.0'
            print(itemcode, "                 No data. default HIGH PRICE: -999.0")

 

        ## retrieving LOWPRICE ============================
        inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
        inStockChart.SetInputValue(3, aCrawlDate) # starting date
        inStockChart.SetInputValue(2, aCrawlDate) # ending date
        inStockChart.SetInputValue(4, 1) # get a data
        inStockChart.SetInputValue(5, 4)    # lowprice
        inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
        inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
        inStockChart.BlockRequest()
        numofreturned_lowprice = inStockChart.GetHeaderValue(3)    # num of rec retrieved
        #print(" Num of retrieved record (LowP) = ", numofreturned_lowprice)
        if (numofreturned_lowprice > 0):
            lowprice = inStockChart.GetDataValue(0,0)      
            #print("    low price was ", lowprice)
        else:
            lowprice = '-999.0'
            print(itemcode, "                   No data. default LOW PRICE: -999.0")

 

        ## retrieving CHANGERATE ============================
        changerate = '-999.0'     
        #print("    changerate was ", changerate)


        ## retrieving TRAMT ============================
        inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
        inStockChart.SetInputValue(3, aCrawlDate) # starting date
        inStockChart.SetInputValue(2, aCrawlDate) # ending date
        inStockChart.SetInputValue(4, 1) # get a data
        inStockChart.SetInputValue(5, 8)    # tramt
        inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
        inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
        inStockChart.BlockRequest()
        numofreturned_tramt = inStockChart.GetHeaderValue(3)    # num of rec retrieved
        #print(" Num of retrieved record (TRAMT) = ", numofreturned_tramt)
        if (numofreturned_tramt > 0):
            tramt = inStockChart.GetDataValue(0,0)      
            #print("    tramt was ", tramt)
        else:
            tramt = '-999.0'
            print(itemcode, "                     No data. default TRAMT: -999.0")

 

 

        ## retrieving ORGTRAMT ============================
        inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
        inStockChart.SetInputValue(3, aCrawlDate) # starting date
        inStockChart.SetInputValue(2, aCrawlDate) # ending date
        inStockChart.SetInputValue(4, 1) # get a data
        inStockChart.SetInputValue(5, 20)    # org tramt
        inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
        inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
        inStockChart.BlockRequest()
        numofreturned_orgtramt = inStockChart.GetHeaderValue(3)    # num of rec retrieved
        #print(" Num of retrieved record (ORG TRAMT) = ", numofreturned_orgtramt)
        if (numofreturned_orgtramt > 0):
            orgtramt = inStockChart.GetDataValue(0,0)      
            #print("    orgtramt was ", orgtramt)
        else:
            orgtramt = '-999.0'
            print(itemcode[1:], "                      No data. default ORG TRAMT: -999.0")
           

        ## retrieving FORTRAMT ============================
        fortramt = '-999.0'     
        #print("    fortramt was ", fortramt)

 

        ## retrieving FORTOTAL ============================
        inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
        inStockChart.SetInputValue(3, aCrawlDate) # starting date
        inStockChart.SetInputValue(2, aCrawlDate) # ending date
        inStockChart.SetInputValue(4, 1) # get a data
        inStockChart.SetInputValue(5, 16)    # startprice
        inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
        inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
        inStockChart.BlockRequest()
        numofreturned_fortotal = inStockChart.GetHeaderValue(3)    # num of rec retrieved
        #print(" Num of retrieved record (FORTOTAL) = ", numofreturned_fortotal)
        if (numofreturned_fortotal > 0):
            fortotal = inStockChart.GetDataValue(0,0)      
            #print("    fortotal was ", fortotal)
        else:
            fortotal = '-999.0'
            print(itemcode, "                          No data. default FORTOTAL: -999.0")

       

        ## retrieving FORPORTION ============================
        inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
        inStockChart.SetInputValue(3, aCrawlDate) # starting date
        inStockChart.SetInputValue(2, aCrawlDate) # ending date
        inStockChart.SetInputValue(4, 1) # get a data
        inStockChart.SetInputValue(5, 17)    # startprice
        inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
        inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
        inStockChart.BlockRequest()
        numofreturned_forportion = inStockChart.GetHeaderValue(3)    # num of rec retrieved
        #print(" Num of retrieved record (FORPORTION) = ", numofreturned_forportion)
        if (numofreturned_forportion > 0):
            forportion = inStockChart.GetDataValue(0,0)
            forportion = float("{0:.4f}".format(forportion))
            #print("    forportion was ", forportion)
        else:
            forportion = '-999.0'
            print(itemcode, "                          No data. default FORPORTION: -999.0")

       

        ## retrieving TOTALTRADEPRICE ============================
        totaltradeprice = '-999.0'   
        #print("    total trade price was ", totaltradeprice)


        ## retrieving CIRCULATIONRATE ============================
        circulationrate = '-999.0'   
        #print("    circulation rate was ", circulationrate)
       

        ## retrieving TOTALSTOCKNUM ============================
        inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
        inStockChart.SetInputValue(3, aCrawlDate) # starting date
        inStockChart.SetInputValue(2, aCrawlDate) # ending date
        inStockChart.SetInputValue(4, 1) # get a data
        inStockChart.SetInputValue(5, 12)    # startprice
        inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
        inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
        inStockChart.BlockRequest()
        numofreturned_totalstocknum = inStockChart.GetHeaderValue(3)    # num of rec retrieved
        #print(" Num of retrieved record (TOTALSTOCKNUM) = ", numofreturned_totalstocknum)
        if (numofreturned_totalstocknum > 0):
            totalstocknum = inStockChart.GetDataValue(0,0)      
            #print("    totalstocknum was ", totalstocknum)
        else:
            totalstocknum = '-999.0'
            print(itemcode, "                     No data. default TOTALSTOCKNUM: -999.0")
           


        ## retrieving SICHONG ============================
        inStockChart.SetInputValue(1, '1')  # retrieve by num of specified date
        inStockChart.SetInputValue(3, aCrawlDate) # starting date
        inStockChart.SetInputValue(2, aCrawlDate) # ending date
        inStockChart.SetInputValue(4, 1) # get a data
        inStockChart.SetInputValue(5, 13)    # startprice
        inStockChart.SetInputValue(6, ord('D')) # chart type,  by Day
        inStockChart.SetInputValue(9, '0')  # modified price? yes. 0 for no
        inStockChart.BlockRequest()
        numofreturned_sichong = inStockChart.GetHeaderValue(3)    # num of rec retrieved
        #print(" Num of retrieved record (SICHONG) = ", numofreturned_sichong)
        if (numofreturned_sichong > 0):
            sichong = inStockChart.GetDataValue(0,0)      
            #print("    sichong was ", sichong)
        else:
            sichong = '-999.0'
            print(itemcode, "                No data. default SICHONG: -999.0")
       

        #sqlStr = ("insert into cybosplus date="+str(aCrawlDate)+", itemcode="+itemcode[1:]+", finalprice="+str(finalprice)+", startprice="+str(startprice))
        #print("insert sqlStr = ", sqlStr)

        add_record = ("insert into datatable "
                      "(date, itemcode, finalprice, startprice, highprice, lowprice, changerate, tradeamount, orgtradeamount, fortradeamount, fortotalamount, forportion, totaltradeprice, totalstocknum, sichong) "
                      "values (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s) "
                      "on duplicate key update "
                      "finalprice=%s, startprice=%s, highprice=%s, lowprice=%s, changerate=%s, tradeamount=%s, orgtradeamount=%s, fortradeamount=%s, fortotalamount=%s, forportion=%s, totaltradeprice=%s, totalstocknum=%s, sichong=%s")
             
        data_record = (aCrawlDate, itemcode[1:], finalprice, startprice, highprice,
                       lowprice, changerate, tramt, orgtramt, fortramt,
                       fortotal, forportion, totaltradeprice, totalstocknum, sichong,
                       finalprice, startprice, highprice,
                       lowprice, changerate, tramt, orgtramt, fortramt,
                       fortotal, forportion, totaltradeprice, totalstocknum, sichong)

        cursor.execute(add_record, data_record)
        cnx.commit()       

cursor.close()
cnx.close()

 


 

posted by Dr.Deeeep