An Approach to Predict Outcomes in Sports Games with Bigdata Techniques and Data Mash-Up
Jee-Ah Shin1, Jin-Hwa Kim2, Joo-Yong Lee3
1Jee-Ah Shin, Department of School of Business, Sogang University, Shinsoo-Dong, Mapo-Gu, Seoul, (Korea), East Asian.
2Jin-Hwa Kim, Department of School of Business, Sogang University, Shinsoo-Dong, Mapo-Gu, Seoul, (Korea), East Asian.
3Joo-Yong Lee, Department of School of Business, Sogang University, Shinsoo-Dong, Mapo-Gu, Seoul, (Korea), East Asian.
Manuscript received on 20 June 2019 | Revised Manuscript received on 27 June 2019 | Manuscript Published on 22 June 2019 | PP: 204-209 | Volume-8 Issue-8S2 June 2019 | Retrieval Number: H10370688S219/19©BEIESP
Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: The purpose of this study is to predict performance of players and teams in baseball games using data mining and data mash-up. In this paper, decision tree technique and data mash-up approach are used for predictions. A data set on 111 games by one of the most outstanding baseball player in S. Korea, player A(S.Y. Lee) and his team is collected for the study. Three sets of data are combined for the mash-up from 3 different sources: Korea Baseball Organization(KBO), Korea Meteorological Administration, and Google Trends. The results from the analysis have 3 findings. Firstly, the important variables for ‘H(Hits)’ are google trends and humidity. If ‘trend’ is 25 and more and ‘humid’ is 77.1 or over, the probability to make one or more hits is 85.7%. It is most likely for player A to make one or more hits when public interest is high and the weather is humid. Secondly, the number of spectators and humidity are significant for ‘BB(base on balls)’. If ‘spect’ is 12823.5 or more, ‘humd’ is 62.2 or above and ‘humd’ is below 73.2, probability to get ‘BB’ is 57.1%. When there are many spectators and it is moderately humid, the probability for getting ‘base on balls(walk)’ is high. Thirdly, wind speed and temperature are important to have a good ‘result’. If ‘wind’ is 2.75 or over and ‘temp’ is 25.1 or above, probability to get a team victory is 100%. When the wind blows a little and the temperature is high, his team will win. This study focuses on a baseball team and a player. Further study can extend the scope of applications to other teams and other sports.
Keywords: Sports Game, Prediction, Data Mining, Data Mash-up, Decision Tree.
Scope of the Article: Data Mining