2 minute read

๐Ÿ“˜ ์ฃผ์ œ ์„ ์ •

์ตœ๊ทผ ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ๊ด€๋ จ ์ˆ˜์—…์„ ํ•˜๋‹ค๊ฐ€ ์กฐ์‚ฌํ•  ๋ฐ์ดํ„ฐ์…‹๊ณผ ์—ฌ๋Ÿฌ๊ฐ€์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋Œ€ํ•ด ๊ณต๋ถ€ํ•˜๋‹ค๊ฐ€ Kaggle์—์„œ ์–ด๋–ค ์ฃผ์ œ์— ๋Œ€ํ•ด์„œ ์กฐ์‚ฌ๋ฅผ ํ• ๊นŒ ๊ณ ๋ฏผ์„ ํ–ˆ๋‹ค.
๊ทธ๋•Œ ์ž˜ ์•Œ๊ณ ์žˆ๋Š” ์ฃผ์ œ๋กœ ํ•ด์•ผ์ง€ ํฅ๋ฏธ๋„ ์ƒ๊ธฐ๊ณ , ์—ด์‹ฌํžˆ ํ•  ๊ฒƒ๋งŒ ๊ฐ™์•„์„œ ์ฐพ์•„๋ณด๋˜ ๋„์ค‘, LCK์˜ ์‹œ์ฆŒ๋ณ„ ์„ ์ˆ˜๋“ค์˜ ์„ฑ์ ๊ณผ ๋‹ค๋ฅธ ์š”์†Œ๋“ค์— ๋Œ€ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค.
๊ทธ๋ž˜์„œ ๊ณผ์—ฐ LCK ์„ ์ˆ˜๋“ค์˜ ์„ฑ์ ๊ณผ ๋‹ค๋ฅธ ์š”์†Œ๋“ค์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์—ฐ๊ด€์ด ์žˆ์„์ง€ ๊ถ๊ธˆํ•ด์ ธ์„œ, ๊ทธ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๊ธฐ๋กœ ํ•˜์˜€๋‹ค.


๐Ÿ“Œ ์‚ฌ์šฉํ•  ๋ฐ์ดํ„ฐ์…‹

  • ์ฃผ์ œ: LCK์—์„œ์˜ ์„ ์ˆ˜ ์Šน๋ฅ ๊ณผ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ๋“ค๊ณผ์˜ ์ƒ๊ด€๊ด€๊ณ„
  • ๋ฐ์ดํ„ฐ์…‹: https://www.kaggle.com/datasets/jackhan9811/lckdataset

image

  • MetaData: ์•ฝ 26๊ฐœ์˜ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ

image

image


๐Ÿ“– ์ „์ฒ˜๋ฆฌ

์ผ๋‹จ ์ฒ˜์Œ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒ์ด์ฌ์˜ pandas๋ฅผ ์ด์šฉํ•˜์—ฌ ์—ด์–ด๋ณด์•˜๋‹ค.
๊ทธ๋Ÿฌ์ž ๋‹ค์Œ๊ณผ ๊ฐ™์ด ? ์™€ ๊ฐ™์€ ์ด์ƒ์น˜ ๋‚˜ - ์™€ ๊ฐ™์€ ๊ฒฐ์ธก์น˜ ๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

image

์ด๋ฅผ ๊ณ ์น˜๊ธฐ ์œ„ํ•˜์—ฌ ๋ฐ”๋กœ pandas๋กœ ์—ด์–ด๋ณด๋Š” ๊ฒƒ์ด ์•„๋‹Œ, csv๋ฅผ ์ด์šฉํ•˜์—ฌ replace๋ฅผ ์ด์šฉํ•œ ์ „์ฒ˜๋ฆฌ๋ฅผ ํ•ด์ฃผ์—ˆ๋‹ค.

import csv
import pandas as pd
from sklearn.cluster import KMeans
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt

f = open("LCK/2016_LCK_SPRING.csv")
data = csv.reader(f)

header = []
data2 = []

for row in range(1):
    header.append(next(data))

for row in data:
    row[0] = row[0].replace("?","")
    for i in range(2, 16):
        row[i] = row[i].replace("-","0")
        row[i] = float(row[i])
    for i in range(16, 26):
        row[i] = row[i].replace("?","")
        row[i] = row[i].replace("-","0")
        row[i] = float(row[i])
    data2.append(row)

df = pd.DataFrame(data2)

df.columns = header

df_f = pd.DataFrame(df.iloc[:,5:8])
# df.iloc[:,5]
# df.iloc[:,6]
df_f


๐Ÿ“– ๋ฐ์ดํ„ฐ ์ถ”์ถœํ•˜๊ธฐ

๋ฐ์ดํ„ฐ์…‹์—์„œ ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๊ฐ€์žฅ ์—ฐ๊ด€๋˜์–ด ์žˆ์–ด ๋ณด์ด๋Š” ๋ฐ์ดํ„ฐ์ธ KDA, Avg Kills, Avg Death, Win Rate ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋น„๊ตํ•˜๊ธฐ๋กœ ํ•˜์˜€๋‹ค.
๊ทธ๋ž˜์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์™”๋‹ค.

import csv
import pandas as pd
from sklearn.cluster import KMeans
import numpy as np
import seaborn as sb
import matplotlib.pyplot as plt

f = open("LCK/2016_LCK_SPRING.csv")
data = csv.reader(f)

header = []
data2 = []

for row in range(1):
    header.append(next(data))

for row in data:
    row[0] = row[0].replace("?","")
    for i in range(2, 16):
        row[i] = row[i].replace("-","0")
        row[i] = float(row[i])
    for i in range(16, 26):
        row[i] = row[i].replace("?","")
        row[i] = row[i].replace("-","0")
        row[i] = float(row[i])
    data2.append(row)

df = pd.DataFrame(data2)

df.columns = header

df_f = pd.DataFrame(df.iloc[:,5:8])
# df.iloc[:,5]
# df.iloc[:,6]
df_f

image


๐Ÿ“– ๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”

matplotlib ํŒจํ‚ค์ง€์— ์žˆ๋Š” pyplot ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์‹œ๊ฐํ™”ํ•˜์˜€๋‹ค.


plt.scatter(X[:,0],X[:,1],c='black',label='KDA/Win Rate')
plt.xlabel('KDA')
plt.ylabel('Win Rate')
plt.title('KDA/Win Rate')
plt.show()

image

์œ„์™€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ ์‹œ๊ฐ๊ฒฐ๊ณผ๋ฌผ์ด ๋‚˜์™”๋‹ค.
KDA ๊ฐ™์€ ๊ฒฝ์šฐ์—๋Š” KDA์™€ Avg Kills ๊ฐ€ ๋†’์„์ˆ˜๋ก Win Rate๊ฐ€ ๋†’๊ฒŒ ๋‚˜์™”๊ณ , Avg Deaths ๊ฐ€ ๋‚ฎ์„์ˆ˜๋ก Win Rate๊ฐ€ ๋†’๊ฒŒ ๋‚˜์™”๋‹ค.

๊ทผ๋ฐ ์—ฌ๊ธฐ์„œ ํ•œ๊ฐ€์ง€ ์˜๋ฌธ์ ์ด ์ƒ๊ฒผ๋Š”๋ฐ, 1๋ฒˆ ๊ทธ๋ฆผ๊ฐ™์€ ๊ฒฝ์šฐ ์™œ ๊ฐ€์žฅ ์˜ค๋ฅธ์ชฝ์— ์žˆ๋Š” ๊ฐ’์ด ํŒŒ๋ž€์ƒ‰์ผ๊นŒ ๋ผ๋Š” ์ƒ๊ฐ€์ด์˜€๋‹ค.
K-means๋ž€ ์ค‘์‹ฌ์ ์œผ๋กœ๋ถ€ํ„ฐ์˜ ๊ฑฐ๋ฆฌ์— ๋”ฐ๋ผ์„œ ๊ตฐ์ง‘์„ ๋‚˜๋ˆ„๋Š” ๋ฐฉ๋ฒ•์ธ๋ฐ ๋” ๊ฐ€๊นŒ์šด ์ค‘์‹ฌ์ ์ด ์žˆ์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ๋‹ค๋ฅธ ์ƒ‰์ƒ์ด ๋˜๋Š” ๊ฒƒ์ด์˜€๋‹ค.




๊ฐœ์ธ ๊ณต๋ถ€ ๊ธฐ๋ก์šฉ ๋ธ”๋กœ๊ทธ์ž…๋‹ˆ๋‹ค.
ํ‹€๋ฆฌ๊ฑฐ๋‚˜ ์˜ค๋ฅ˜๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ ์ œ๋ณดํ•ด์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.๐Ÿ˜
์ฒซ ๋ฒˆ์งธ ๊ธ€์ž…๋‹ˆ๋‹ค ๊ฐ€์žฅ ์ตœ๊ทผ ๊ธ€์ž…๋‹ˆ๋‹ค