You are on page 1of 7

COMPLEX LEARNING MODEL TO PREDICT WHETHER IT IS SAFE OR UNSAFE TO BET ON THE HORSE RACE

KEY TOPICS 1. Importance of this objective 2. Nature of the input data 3. Architecture overview 4. Prediction Accuracy

INTRODUCTION : Predicting the winner (or first 3 positions) in a horse race is one of the interesting challenges in the field of Machine learning. Lot of models have been proposed to predict the winner of the horse race with certain degree of accuracy. Still the odds of winning a bet is significantly low due to chaotic nature of the horse race. Hence many ideas have been proposed to increase the accuracy of the prediction. Here I am going to discuss about an idea where instead of trying to improve the ranking directly we try to classify the races as safe or unsafe races to bet based on relative strengths of horses participating in the race.

Objective:To classify the given set of races as safe or unsafe races to bet based on the relative strengths of the horses participating in the race. Importance of this objective:Solving this objective will provide us a model which will help us to decide the accuracy of prediction made by the ranking algorithm on the horse race under consideration. Ultimately, one can save money by refraining from the race if this model predicts the race as an unsafe race to bet. This model in combination with the ranking algorithm (developed by thoroughbred racing committee) will improve the chances of making more money in betting on the horse race. Training Data :Each instance of the training data consists of features of various horses participating in the race. A 0/1 label ( referred as "training label" throughout the document) says whether the Ranking algorithm incorrectly/correctly predicts the first three positions of the horse race. Nature of the Training Data : Let there be m training instances (m races) , r1,r2 ... rm . Each training instance ri consists of variable number of horses participating in it. At thoroughbred racing, Average number of horses participating in a race is between 8 - 10. In General the number of horses participating in a race varies from 5 to 20 horses. In thoroughbred racing industry the max limit on the number of horses per race is 15.

The accuracy of the first three finish positions made by the ranking algorithm is vastly affected by the number of horses participating in the race. Hence the model should be flexible enough to handle this variation. --> [A] More than the number of horses participating in the race, the features of horses participating in the race predominantly decides whether it is safe or unsafe to bet on the horse race. --> [B] A race instance 'r' will look like this (Note: Feature names have been represented as F1,F2...Fn, horses are represented as H1,H2 ... Hk) HiFj - value of feature Fj for horse Hi. Race instance 'r' F1 H1F1 H2F1 HiF1 HkF1 F2 H1F2 H2F2 HiF2 HkF2 ... Fj H1Fj H2Fj HiFj HkFj Fn H1Fn H2Fn HiFn HkFn

H1 H2 ... Hi .... Hk

Each race is k X n matrix (where k is a variable component in each race). Hence the model should be able to make predictions such it takes both [A] and [B] into account.

Short comings of a simple neural network model:We have feature Fj (which is composed of H1Fj , H2Fj .... HkFj) . Usage of direct mathematical functions like variance to measure the relative strength gave a poor hypothesis. Hence we develop a two tier architecture in which first we try to use a learning algorithm to measure this relative strength and then we try to use this output as input to another learning algorithm where the machine will be able to learn the final hypothesis

Architecture Overview
The model consists of two important components 1. Cluster of Neural networks. (variable in number) 2. Main Logistic regression classifier. Cluster of Neural Networks:The supporting neural network models tries to interpret the relation between the features of various horse belonging to race instance and the label. A neural network N(k,Fj) is trained for Feature Fj, which takes the Feature Fj column from races that contains exactly k number of horses participating in it. Example: A neural network N(8,F9) is trained for Feature F9 , takes the Feature F9 column from races that contains exactly 8 number of horses participating in it. Hence, for Neural network N(k,Fj) the training data is Fj column of set {rk} such that each instance has exactly k number of horses. The table shows what will be the inputs for each neural network. 5 X F1 denotes the input instance for neural network N(5,F1). F1 F1 of {R5} F1 of {R6} .. .. .. F1 of {R10} F2 F2 of {R5} F2 of {R6} F3 F3 of {R5} F3 of {R6} ... Fn-1 Fn-1 of {R5} Fn-1 of {R6} Fn Fn of {R5} Fn of {R6}

5 6 7 8 9 10

F2 of {R10}

F3 of {R10}

...

Fn-1 of {R10}

Fn of {R10}

There will be a total of No. of features * (max no. of horses that can be present in a race instance - min no. of horse that can be present in a race instance) neural networks created and trained in the cluster. After training the supporting neural network models, we get learning parameters for all the neural networks in the cluster.

Pictorial representation of the neural network N(5,F1)

Creating the input for the Logistic regression classifier We input the training instance to the function which picks up the correct neural network based on the feature it is trying to train and the number of horses in the training instance. If there are N features and k horses in a race instance 'r' then the neural networks used for getting the inputs for next phase are N(k,F1) , N(k,F2) ..... N(k,Fn) . The output provided by this neural networks say o(F1) , o(F2) , o(F3) ... o(Fn) for training instance 'r' along with its label is an input instance for Logistic regression classifier. Eg. Consider the generic race instance which has k number of horses. F1 H1F1 H2F1 HiF1 HkF1 F2 H1F2 H2F2 HiF2 HkF2 ... Fj H1Fj H2Fj HiFj HkFj Fn H1Fn H2Fn HiFn HkFn

H1 H2 ... Hi .... Hk

After passing through its relevant neural networks we will get an output (between 0 and 1) which serves as an input for logistic regression classifiers F1 O(F1) F2 O(F2) ...Fj O(Fj) Fn O(Fn)

Logistic regression classifier


The logistic regression classifier trains against the respective race instance label to get an accurate hypothesis.

2 - tier Architecture involving Neural Network and Logistic regression for horse race classification.

Prediction Accuracy
The prediction accuracy is pretty much enhanced when compared to other approaches where we try to make mathematical functions to compare the relative strengths. The prediction accuracy is around 85 %. The slightly lesser accuracy is due to the fact that there are not much data on those races which involves more 12 horses.

Thank you Note:

I sincerely thanks Mr.Appolos Coleman for helping me by providing the necessary information and exposure on horse racing and training data.

You might also like