You are on page 1of 9

_____Data Mining Techniques for Crime Pattern Recognition and

Early Warning System___

Data Mining Techniques for Crime Pattern


Recognition and architecting Early Warning
System.
Saurav Jana, 08BM8010 MBA
Vijay Kumar Das, 08BM8003 MBA

Abstract

This paper tells about the detectives to speed up the


Data mining Techniques process of solving crimes.
that can be used to detect In this paper we will take an
the Crime pattern, region- approach which is a
wise and behavior or combination between
nature-wise. It tries to find computer science and
out a framework for an criminal justice to study a
Early Warning System and data mining technique that
proposes an Architecture can help solve crimes faster.
for the design of the We will use clustering based
system. models to help in
identification of crime
Introduction pattern on the basis of their
characteristics. We will then
Data mining and data use this valuable data to
analysis techniques are make an Early Warning
powerful tools in today’s system for possible threats
world which are helpful in and attacks by terrorists
contributing for the and offenders in future.
intelligence and countering
terrorism by law- Survey of Literature
enforcement officers. By the First of all, most of us are
increasing use of the confused with the term
computerized and ‘Data mining’. Data mining
automated systems to track as mentioned in Wikipedia-
crimes, computer data [1] is the process of
analysts have been helping extracting patterns from
the Law enforcement data. Data mining is
officers, sleuths and becoming an increasingly
important tool to transform
this data into information z2 = (x2 + y2)/2 and z3 =
and give a meaningful idea (x3 + y3)/2.
to the data by analysis.
What data mining does is:
“discover useful, previously A warning system is any
unknown knowledge by system of biological or
analyzing large and technical nature deployed
complex” datasets.[2] by an individual or group to
inform of a future danger.
Clustering or Cluster Its purpose is to enable the
analysis is the grouping deployer of the warning
of a set of observations into system to prepare for the
groups called clusters, so danger and act accordingly
that observations in the to mitigate against or avoid
same cluster are similar in it.
some sense. Clustering is a
method of unsupervised Early warning can be
learning and a common defined as information
technique for statistical which a national
data analysis used in many government or international
fields, including machine or regional organization
learning, data mining, receive in advance in order
pattern recognition and bio- to be able to react timely
informatics. and effectively towards a
crisis situations. An early
Here we will discuss about warning system[4] in
the k-means[3] algorithm context of crime reporting
for clustering. It assigns will need to address the
each point to the cluster following questions:
whose center also called the
centroid, is nearest. The 1) to identify “who is doing
center is the average of all what”
the points in the cluster. By 2) to estimate “who can do
this we mean, its what” in order to
coordinates are the
3) predict “what can really
arithmetic mean for each
happen tomorrow”.
dimension separately over
all the points in the cluster.
Example: The data set has Crime Reporting Systems
three dimensions and the and Databases:
cluster has two points: X = Most police departments
(x1, x2, x3) and Y = (y1, y2, use some electronic systems
y3). Then the centroid Z for crime reporting that
becomes Z = (z1, z2, z3), have replaced the
where z1 = (x1 + y1)/2 and traditional time-consuming
paper-based crime reports. Hence clustering algorithms
These crimes reports have in data mining are
the following kinds of equivalent to identifying
information categories groups of records that have
namely - type of crime, date, similarities between
time, location, weaponry themselves but have
etc. There is information differences from the rest of
about the suspect (identified the groups. Thus we need to
or unidentified), victim and work on the variances of the
the witness. Also, there is data between the groups
the narrative information or and within the groups. In
description of the crime and our case some of these
Modus Operandi (MO) that clusters or groups will be
is usually in the text form. useful for identifying a
The police officers or crime spree committed by
detectives use free text to one or same group of
record most of their suspects.
observations which are Thus, provided with this
particular to the cases and information, the next
which cannot be included in challenge is to find the
checkbox kind of pre- variables providing the best
determined questions. clustering. The variables
While storing the first two can be Crime type, Suspect
categories of information race, age, sex, Crime
are usually done in the weapon, motives etc.
computer databases as Without a suspected crime
numeric, character or date pattern, the detective will
fields of table, the last be less likely to build the
category is often stored as complete picture from bits
free text. of information from
different crime incidents.
Data mining using Today most of this process
Clustering: Use and of connecting the data-
Implementation points is manually done with
In crime terminology a the help of multiple
cluster is a group of crimes spreadsheet reports that the
in a geo-spatial region or a detectives usually get from
hot spot of crime. In data the computer data analysts
mining terminology a and their own crime logs.
cluster is group of similar Simple example:
data points –that is a
CRIM SUSPE VICTI WEAPO SITE
possible crime pattern. E CT AGE M N *
TYPE AGE
*
TA 25 37 Bomb 1 TA 34 56 Bomb 1
TA 28 13 Grenad 2
e
MA 25 34 Mine 3
MA 35 65 Bomb 3

*Let,
TA- Terrorist attack,
labeled as 1.
MA- Maoist attack, labeled
as 2.
Sites are Jammu=1,
Delhi=2, Jharkhand=3.

Output of statistical tools


like SPSS for Two-means
cluster for variables Crime
type and Site is given:

Figure: Pie chart showing


Cluster size
The figure shows that we
obtained two clusters, the
cluster1 consists three
cases, and the cluster2
consists 2 cases.
crime pattern on the basis
of the Crime sites and the
type of the crime
committed. This type of
technique is very helpful for
the crime department to
know the type of the crime
committed, by segregating
the vast source of
information and pulling out
From the above figure it is the threads which has the
clear that the first cluster necessary meaning and
contains the Site 1 and 2 information.
both of which comes under
terrorist attack type and the Early Warning System:
second cluster contains the Architecture
Site 3, which is of the
Maoist attack crime type.
Typically EWAS consist of
five phases[5] which are all
The figure can be plotted associated to each other.
as: The first step is risk
assessment which involves
understanding and mapping
the threat; followed by
prevention of the threat.
The third phase is about
monitoring the risk in order
to detect possible early
warnings, including
forecasting the near future
events. Dissemination of the
comprehensible warnings to
political authorities and the
inhabitants are disclosed
later. Within the fifth phase
of the system appropriate
Figure: Map of the Crime sites and timely actions in
by plotting the cluster analysis response to the warnings
data. are carried out.

Thus this picture gives In this paper we will discuss


valuable information of the the architecture of a EWAS
[5] that can be used to warn
in case of a threat or attack.

c. Data gathered manually


Figure: System architecture from fields
Media monitoring service 2. Processing
provides clients with a. Semantic entity and
documentation, analysis, or relationship
copies of media content of
management system
interest to the clients. Such
services which analyze and b. Social network analysis
read thousands of news (SNA) tools
items from heterogeneous c. Framework for social
sources and provide scientists interaction like
consolidated channels to SOMA portal [7]
access the news have
become available. d. Rule based engine

The architecture of EWAS 3. Output system


consists of three main a. Warning generation
components (as shown in system
Figure):
b. Dissemination system
1. Input (Data gathering
sources)
Component description:
a. News consolidation
system 1. Input (Data gathering
sources)
b. Data extraction by Spiner
[5] a) News consolidation
system: The news
consolidation system can be
something which can help a) Semantic entity and
the data analyst by relationship management
providing structured system:
information about entities of The most important things
interest by mining news to early warning include
documents provided from actors (like individuals,
different sources. Europe groups, organizations, and
Media Monitor (EMM) RSS countries) and resources.
feed is only one of such The knowledge about the
sources for the news things are interpreted by
consolidation system. attributes and character of
the actors and is
represented as profiles.
b) By using Web analytics Finally, the situations,
one can use “Spiner”, which conditions and context in
was proposed in a recent which the events take place
research paper [6]. Spiner is important in that it may
can be a useful system to create constraints on the
practice in a dark web actors in their choice of
analysis scenario. Efficient actions or available
wrappers were designed for opportunities that would
different banned terrorist otherwise not have been
groups disguised websites, possible.
terrorism databases, and
government information There are numerous ways
sites. The system’s web available to refer to an actor
robot crawls through these of an event, which makes
sites and extract valuable the identification of the
information data provided entities and their
by the system’s web robots. relationships a real
From the analysis in the problem. In addition to this,
paper, it is evident that in terrorism, actors
Spiner is capable of commonly have name
handling the dynamicity and aliases. Hence we need a
chaotic characteristic of mechanism to identify the
today’s World Wide Web by entities and then use the
using structural data mining News base to measure the
enhanced with social relations among these
network analysis tools and actors.
techniques. b) Social network analysis
c) Data gathering is done tool:
manually. After data gathering, we
need to provide social
2. Processing scientists with some data
mining and social network f) Warning generation
analysis tools which can be engine:
used to detect main players The Warning generation in
and master-minds in such a system would be a
networks of actors, estimate function of rule or a number
their influence and of rules. The task is
dependency among them. accomplished by analyzing
the rules of the engine i.e.,
c) Framework for interface the warnings can be
for social scientists: A type generated by sequencing of
of GUI based platform is rules flagged true by the
required to provide the rule based engine.
social scientists and
investigators with a tool, 3. Output (Dissemination
where they will be able to system)
create their own rule based a) A dissemination system is
theories, manage, keep needed to notify the
updating them, and will be warnings to subscriber
helpful in testing, analyzing, devices. The notification
and prototyping them. system should be able to
We will assume to use perform two tasks- of
Stochastic Opponent managing subscription
Modeling Agents (SOMA) information and handling
Terrorist Organizations the subscriber devices. A
Portal (STOP) [7] for this simple and realistic way of
purpose. doing so would be that we
have to take care of at least
two devices: an email
e) Rule-based engine: receiver (i.e an internet
A rule-based engine is used, enabled machine) and SMS
in which a set of rules are and Call receiver (a landline
pre-decided so when any or mobile phone).
news of items or data
arrives then the best rule
for it is identified. The
engine also has the Conclusion
provisions for the There is ample amount of
mechanisms and schemes scope in data mining to use
for rating, grading and valuable data and to extract
success rate of rules that information from it, by
will reside in rule based further studies in this area
engine. as is done in this paper. .
EWAS is designed to be an
early warning system on
which more advanced data Y. Ishikawa et al. (Eds.): APWeb
mining research can be 2008 Workshops, LNCS 4977, pp.
carried out. There can be 54–64, 2008
more work on Web analytics [7] Aaron Mannes, Mary Michael,
and structural data mining Amy Pate, Amy Sliva, V.S.
techniques to enhance the Subrahmanian, Jonathan Wilkenfeld.
investigative processes "Stochastic Opponent Modeling
further. Agents: A Case Study with
Hezbollah," First International
Workshop on Social Computing,
Behavioral Modeling, and Prediction,
April 2008
References:
[1]Wikipedia.org
[2] David Jensen, “Data
Mining in Networks,”
presentation at CSIS Data
Mining Roundtable,
Washington, D.C., July 23,
2003.

[3] Yi Lu, Shiyong Lu,


Farshad Fotouhi, Youping
Deng, and Susan Brown,
"FGKA: A Fast Genetic K-
means Algorithm", in Proc.
of the 19th ACM
Symposium on Applied
Computing.
[4] Nasrullah Memon, Uffe Kock
Wiil, “Design and Development of an
Early Warning System to Prevent
Terrorist Attacks”
[5] Best, C., van der Goot, E.,
Blackler, K., Garcia, T., Horby,
D.:Europe Media Monitor. Technical
Report EUR 22173 EN, European
Commission (2005)

[6] Memon Nasrullah; Qureshi Abdul


Rasool; Hicks David; Harkiolakis
Nicholas: Extracting Information
from Semi-Structured Documents, In

You might also like