You are on page 1of 3

An Enhancement Study of an existing

methods for Plagiarism Detection using Soft


Computing Techniques
ABSTRACT:
Ever since we entered the digital communication era, the ease of information sharing
through the internet has encouraged online literature searching. With this comes the potential risk
of a rise in academic misconduct and intellectual property theft. As concerns over plagiarism
grow, more attention has been directed towards automatic plagiarism detection. This is a
computational approach which assists humans in judging whether pieces of texts are plagiarized.
However, most existing plagiarism detection approaches are limited to superficial, brute-force
string matching techniques. If the text has undergone substantial semantic and syntactic changes,
string-matching approaches do not perform well. In order to identify such changes, linguistic
techniques which are able to perform a deeper analysis of the text are needed. To date, very
limited research has been conducted on the topic of utilizing linguistic techniques in plagiarism
detection. This provides novel perspectives on plagiarism detection and plagiarism direction
identification tasks. The hypothesis is that original texts and rewritten texts exhibit significant
but measurable differences, and that these differences can be captured through statistical and
linguistic indicators. To investigate this hypothesis, four main research objectives are defined.
First, a novel framework for plagiarism detection is proposed. It involves the use of Natural
Language Processing techniques, rather than only relying on the vii traditional string-matching
approaches. The objective is to investigate and evaluate the influence of text pre-processing, and
statistical, shallow and deep linguistic techniques using a corpus-based approach. This is
achieved by evaluating the techniques in two main experimental settings. Second, the role of
machine learning in this novel framework is investigatedFurther experiments show that
combining shallow and viii deep techniques helps improve the classification of plagiarized texts
by reducing the number of false negatives. In addition, the experiment on plagiarism direction
detection shows that rewritten texts can be identified by statistical and linguistic traits.

KEYWORDS: Plagiarism detection tool, classification, Natural Language processing, Machine


learning techniques
Existing System:

The term “plagiarize” is defined as to take (ideas, documents, code, image, etc) from another and
pass them off as one's own without citation. So plagiarism is a global problem, which occurs in
many different areas of our life. There are many different forms of plagiarism, Plagiarism at
schools can be a highly de-motivating factor for teachers and also for students. If plagiarism is
not addressed sufficiently, plagiarists could gain undeserved advantage, e.g. more marks for their
assignments with less effort.

Disadvantages:

1. Existing plagiarism techniques are not addressed sufficiently.

2. Its highly de-motivating factor for teachers.

Proposed System:

. The objective is to determine whether the application of machine learning in the plagiarism
detection task is helpful. This is achieved by comparing a threshold setting approach against a
supervised machine learning classifier. Third, the prospect of applying the proposed framework
in a large-scale scenario is explored. The objective is to investigate the scalability of the
proposed framework and algorithms. This is achieved by experimenting with a large-scale corpus
in three stages. The first two stages are based on longer text lengths and the final stage is based
on segments of texts. Finally, the plagiarism direction identification problem is explored as
supervised machine learning classification and ranking tasks. Statistical and linguistic features
are investigated individually or in various combinations. The objective is to introduce a new
perspective on the traditional brute-force pair-wise comparison of texts. Instead of comparing
original texts against rewritten texts, features are drawn based on traits of texts to build a pattern
for original and rewritten texts. Thus, the classification or ranking task is to fit a piece of text into
a pattern. The framework is tested by empirical experiments, and the results from initial
experiments show that deep linguistic analysis contributes to solving the problems.

Advantages:

1. Proposed detection tool will show high accuracy in detection of plagiarism.


2. Compare to existing, proposed tool gives the best results

Software Requirements:
Language : JDK (1.7.0)
Frontend : JSP, Servlets
Backend : Oracle10g
IDE : my eclipse 8.6
Operating System : windows XP
Server : Tomcat
Hardware Requirements:
Processor : Pentium IV
Hard Disk : 80GB
RAM : 2GB

You might also like