Auto-grading Questions K Nearest Neighbours Discussion Short-answer questions can target learning Task goals more effectively than multiple choice Unseen Hybrid Siamese Neural • Evaluation of sentence similarity can be as they eliminate test-taking shortcuts like Questions Network improved by providing a sentence similarity eliminating improbable answers. However, index in addition to 0/1 labels. staff grading of textual answers simply Word Weighted Cosine K-nearest • We found in k-NN approach that correct doesn’t scale to massive classes. Grading Embedding Sentence Similarity neighbours Accuracy = 79% Embedding responses are unexpectedly very similar answers has always been time consuming and hence we inserted more reference and costs a lot of Public dollars in the Weight = idf * (w_pos - w_neg); W_pos = Correct answers with given word/Total number of correct answers answers to cover all writing styles and US. We start in this project by tackling the W_neg = Wrong answers with given word/Total number of wrong answers reinforce the algorithm’s similarity simplest problem where we attempt to detection. make an machine learning based system • The hybrid model tend to misclassify long which would automatically grade one line Model Accuracy MSE (%) sentences which probably can be improved answers based on the given reference by using a different attention layer. answers. LSTM + Manhattan 62% 0.25 Distance [1] • The hybrid model also especially misclassifies the sentences which have the LSTM + Attention + FNN 73% 0.18 keyword missing in them or written in some Dataset and Features [2] other form. • We chose the publicly available Student CNN + Bi - LSTM + 69% 0.20 Manhattan Response Analysis (SRA) dataset. Within the Q. What is the relation between tree rings and dataset we used the SciEntsBank part. Our Model 76% 0.16 time? • This dataset consists of 135 questions from • We created an expansion of the Siamese neural network to employ bidirectional LSTM with various physical sciences domain. It has a attention layer and mixed it with KNN’s intuition to achieve better results. Ref: As time increases, number of tree rings reference short answer and 36 student • The branches of the network learn sentence embedding for each of the student answer and also increases. responses per question. reference answer. After merging, a fully connected layer measured the similarity between the Ans: They are both increasing • Total size of dataset is 4860 data points. two answers to score the answer as correct or incorrect. • In the initial models we used Manhattan distance, cosine similarity as the similarity metric. Original Label: Correct • Ground truth labels are available in the Model Result: Misclassified due to missing dataset whether each student response is Data split : 80% train, 10% validation, 10% test data ; Loss: Binary Cross Entropy; Optimizer: Adam keywords correct or incorrect. Epochs: 50; Attention Layer: Softmax
Data pre-processing including tokenization, Future Work References
stemming and spell checking each of the • Trying out different attention layer to smooth 1. Jonas Muller and Aditya Thygarajan, “Siamese student responses. out key word issue Recurrent Architecture for learning sentence similarity”, AAAI-16 • We would like to improve this model and run We used the Pre-trained Glove embedding it on a larger unseen and out of domain 2. Ziming Chi and Bingyan Zhang, “A sentence similarity trained on Wikipedia and Gigaword 5 with 400K dataset to gauge its robustness. estimation method based on improved Siamese vocabulary and 300 features. Network”, JILSA-2018 • Try adding better reference answers or better 3. Tianqi Wang et.al, “Identifying Current Issues in Short similarity detection mechanisms. Answer Grading”, ANLP-2018
[Studies in Computational Intelligence 740] Khaled Shaalan,Aboul Ella Hassanien,Fahmy Tolba (eds.) - Intelligent Natural Language Processing_ Trends and Applications (2018, Springer International Publishing).pdf