FraudRank Document

TABLE OF CONTENT
S.NO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
TITLE
Abstract
Introduction
Existing system
Disadvantages
Proposed system
Advantages
System Architecture
Flow Diagram
Use case Diagram
Class Diagram
Sequence Diagram
ER diagram
Testing Of Product
Modules
Modules Description
Algorithm Description
Software Requirements
Hardware Requirements
H/W&S/W Description
Literature Survey
Screen Shots
Future Enhancement
Conclusion
References
PAGE .NO
2
3
DISCOVERY OF RANKING FRAUD FOR MOBILE APPS

ABSTRACT
In Real world online app marketing now a days phase the fraud Rank issue
is the big concerning problem. Ranking fraud in the mobile App market refers to
fraudulent or deceptive activities which have a purpose of bumping up the Apps in
the popularity list indeed , it becomes more and more frequent for App developers
to use shady means, such as inflating their Apps sales or posting phony App
ratings, to commit ranking fraud. While the importance of preventing ranking fraud
has been widely recognized, there is limited understanding and research in this
area. Several recent studies have pointed out that advertising in mobile apps is
plagued by various types of frauds. In Web advertising, most fraud detection is
centered on analyzing server-side logs network traffic, which are mostly effective
for detecting bot-driven ads. These can also reveal placement frauds to some
degree, but such detection is possible only after fraudulent impressions and clicks
have been created. While this may be feasible for mobile apps, we explore a
qualitatively different approach: to detect fraudulent behavior by analyzing the
structure of the app, an approach that can detect placement frauds more effectively
and before an app is used. Our approach leverages the highly specific, and legally
enforceable, terms and conditions that ad networks place on app developers. This
project proposed approach investigate three types of evidences, ranking based
evidences, rating based evidences and review based evidences aggregation for all
apps, by modeling Apps ranking, rating and review behaviors through statistical
hypotheses tests. It proposes an optimization based aggregation method. And Find

the fraudulent Rank approach.
INTRODUCTION
Web spam refers to all forms of malicious manipulation of user generated
data so as to impudence usage patterns of the data. The number of mobile Apps has
grown at a breath Taking rate over the past few years. To stimulate the
development of mobile Apps, many App stores launched daily App leader boards,
which demonstrate the chart rankings of most popular Apps. Indeed, the App
leader boars one of the most important way for promoting mobile Apps. A higher
rank on the leader board usually leads to huge number of downloads and million
dollars in the revenue. Therefore, App developers tend to explore various ways
such as advertising campaigns to promote their Apps in order to have their Apps
ranked as high as possible in such App leader boards Margins, column widths, line
spacing, and type styles are built-in; examples of the type styles are provided
throughout this document and are Identified in italic type, hence within
parentheses, following the example. Some components, such as multi-leveled
equations, graphics, and tables are not prescribed, although the various table text
styles are provided. The formatter will need to create these components,
incorporating the applicable criteria that follow. Indeed, our careful observation
reveals that mobile Apps are not always ranked high in the leader board, but only
in some leading events, which form different leading sessions. Note that we will
introduce both leading events Ease of Use and leading sessions in detail later. In
other words, ranking fraud usually happens in these leading sessions. Therefore,
detecting ranking fraud of mobile Apps is actually to detect ranking fraud within
leading sessions of mobile. Several recent studies have pointed out that advertising
in mobile (smartphones and tablets) apps is plagued by various types of frauds.
Mobile app advertisers are estimated to lose nearly 1 billion dollars (12% of the
mobile ad budget) in 2013 due to these frauds. The frauds fall under two main
categories: (1) Bot-driven frauds employ bot networks or paid users to initiate fake
ad impressions and clicks (more than 18% impressions/clicks come from bots), and
(2) Placement frauds manipulate visual layouts of ads to trigger ad impressions and
unintentional clicks from real users (47% of user clicks are reportedly accidental).
Mobile app publishers are incentivized to commit such frauds since ad networks
pay them based on impression count click count, or more commonly, combinations
of both. Bot-driven ad frauds have been studied recently, but placement frauds in
mobile apps have not received much attention from the academic community. In
this paper, we make two contributions. First, we present the design and
implementation of a scalable system for automatically detecting ad placement
fraud in mobile apps. Second, using a large collection of apps, we characterize the
prevalence of ad place-Detecting ad fraud. In Web advertising, most fraud
detection is centered on analyzing server-side logs or network traffic, which are
mostly effective for detecting bot-driven ads. These can also reveal placement
frauds to some degree (e.g., an ad not shown to users will never receive any
clicks), but such detection is possible only after fraudulent impressions and clicks
have been created. While this may be feasible for mobile apps, we explore a
qualitatively different approach: to detect fraudulent behavior by analyzing the
structure of the app, an approach that can detect placement frauds more effectively
and before an app is used (e.g., before it is released to the app store). Our approach
leverages the highly specific, and legally enforceable, terms and conditions that ad
networks place on app developers. For example, Microsoft Advertising says
developers must not edit, resize, modify, filter, obscure, hide, make transparent, or
reorder any advertising. Despite these prohibitions, app developers continue to
engage in fraud: Figure shows (on the left) an app in which 3 ads are shown at the
bottom of a page while ad networks restrict developers to 1 per page, and (on the
right) an app in which an ad is hidden behind UI buttons. The key insight in our
work is that manipulation of the visual layout of ads in a mobile app can be
programmatically detected by combining two key ideas: (a) a UI automation tool
that permits automated traversal of all the pages of a mobile app, and (b)
extensible fraud checkers that test the visual layout of each page for compliance
with an ad networks terms and conditions. While we use the term ad fraud, we
emphasize that our work deems as fraud any violation of published terms and
conditions, and does not attempt to infer whether the violations are intentional or
not. This survey paper categorizes, compares, and summarizes from almost all
published technical and review articles in automated fraud detection within the last
10 years. It defines the professional fraudster, formalizes the main types and
subtypes of known fraud, and presents the nature of data evidence collected within
affected industries. Within the business context of mining the data to achieve
higher cost savings, this research presents methods and techniques together with
their problems. Compared to all related reviews on fraud detection, this survey
covers much more technical articles and is the only one, to the best of our
knowledge, which proposes alternative data and solutions from related domains.
Data mining is about finding insights which are statistically reliable, unknown
previously, and actionable from data. This data must be available, relevant,
adequate, and clean. Also, the data mining problem must be well-defined, cannot
be solved by query and reporting tools, and guided by a data mining process
model. The term fraud here refers to the abuse of a profit organizations system
without necessarily leading to direct legal consequences. In a competitive
environment, fraud can become a business critical problem if it is very prevalent
and if the prevention procedures are not fail-safe. Fraud detection, being part of the
overall fraud control, automates and helps reduce the manual parts of a
screening/checking process. This area has become one of the most established
industry/government data mining applications. It is impossible to be absolutely
certain about the legitimacy of and intention behind an application or transaction.
Given the reality, the best cost effective option is to tease out possible evidences of
fraud from the available data using mathematical algorithms. Evolved from
numerous research communities, especially those from developed countries, the
analytical engine within these solutions and software are driven by artificial
immune systems, artificial intelligence, auditing, database, distributed and parallel
computing, econometrics, expert systems, fuzzy logic, genetic algorithms, machine
learning, neural networks, pattern recognition, statistics, visualization and others.
There are plenty of specialized fraud detection solutions and software which
protect businesses such as credit card, e-commerce, insurance, retail,
telecommunications industries. There are often two main criticisms of data miningbased fraud detection research: the dearth of publicly available real data to perform
experiments on; and the lack of published well-researched methods and techniques.
To counter both of them, this paper garners all related literature for categorization
and comparison, selects some innovative methods and techniques for discussion;
and points toward other data sources as possible alternatives. Many Android
applications are distributed for free but are supported by advertisements. Ad
libraries embedded in the app fetch content from the ad provider and display it on
the app's user interface. The ad provider pays the developer for the ads displayed to
the user and ads clicked by the user. A major threat to this ecosystem is ad fraud,
where a miscreant's code fetches ads without displaying them to the user
or\clicks"on ads automatically. Ad fraud has been extensively studied in the
context of web advertising but has gone largely unstudied in the context of mobile
advertising. Online advertising is a financial pillar that supports both free Web
content and services, and free mobile apps. Both web and mobile advertising use a
similar infrastructure: the ad library embedded in the web page or mobile app
fetches content from ad providers and displays it on the web page or the mobile
app's user interface. The ad provider pays the developer for the ads displayed
(impressions) and the ads clicked (clicks) by the user. Because web and mobile
advertising use a similar infrastructure, they are subject to the same security
concerns, such as tracking and privacy infringements. Perhaps the biggest threat to
the sustainability of this ecosystem is ad fraud, where a miscreant's code fetches
ads without displaying them to the user or \clicks" on ads programmatically. Ad
fraud has been extensively studied in the context of web advertising but has gone
largely unstudied in the context of mobile advertising. On the web, ad fraud is
often perpetrated by botnets, which are collections of compromised user machines
called bots. Fraudsters issue fabricated impressions and clicks using bots so that
the traffic they generate is varied (i.e., by IP address), making the fraud harder to
detect. We take the first step to study fraud and other undesirable behavior in
mobile advertising. First, we identify unique characteristics of mobile ad fraud. On
Android, at any time at most one app is running in the foreground, where the app
has a UI. Our first observation is that when an app fetches ads while it is in the
background, this is most likely fraudulent, because the app developer gets credit
for this ad impression without displaying it to the user. Our second observation is
that when an app clicks an ad without user interaction, it is definitely fraudulent.
Based on our observations, we set out to measure the prevalence of ad fraud in the
wild. We use two sets of apps: 1) 130,339 apps crawled from 19 Android markets
including Play and many third-party markets, and 2) 35,087 apps that likely
contain malware provided by a security company. We build a testing infrastructure,
where we launch multiple instances of the Android emulator concurrently. In each
emulator, we install an app from our datasets, run it for a fixed time, push it to the
background, and continue running for a fixed time, while capturing all the network
traffic from the emulator. Finally, we extract impressions, clicks, and other ad
related activities from the network traffic.
OVERVIEW
Several recent studies have pointed out that advertising in mobile apps is plagued
by various types of frauds. In Web advertising, most fraud detection is centered
around analyzing server-side logs network traffic, which are mostly effective for
detecting bot-driven ads. These can also reveal placement frauds to some degree,
but such detection is possible only after fraudulent impressions and clicks have
been created.
While this may be feasible for mobile apps, we explore a qualitatively different
approach: to detect fraudulent behavior by analyzing the structure of the app, an
approach that can detect placement frauds more effectively and before an app is
used. Our approach leverages the highly specific, and legally enforceable, terms
and conditions that ad networks place on app developers.
OBJECTIVE
Detect fraud ranking in daily app leaderboard. Avoid ranking manipulation. The
proposed frame-work is scalable and can be extended with other domaingenerated evidences for ranking fraud detection. The scalability of the detection
algorithm as well as some regularity of ranking fraud activities.
Application:
In This Process focus on research in data and knowledge engineering, for developing
effective and efficient data analysis techniques for emerging data intensive
applications. area and Process area. And the Google baseline is used for
evaluating the effectiveness of our ranking aggregation method.
EXISTING SYSTEM
This project proposes techniques to accurately locate the ranking fraud by
mining the active periods, namely leading sessions, of mobile Apps. Proposed
approach investigate three types of evidences, i.e., ranking based evidences, rating
based evidences and review based evidences, by modeling Apps ranking, rating
and review behaviors through statistical hypotheses tests. It proposes an
optimization based aggregation method. The proposed frame-work is scalable and
can be extended with other domain-generated evidences for ranking fraud
detection. A critical challenge along this line is that the context log of each
individual user may not contain sufficient data for mining his/her context-aware
preferences. Therefore, we propose to first learn common context-aware
preferences from the context logs of many users. Then, the preference of each user
can be represented as a distribution of these common context-aware preferences.
Specifically, we develop two approaches for mining common context-aware
preferences based on two different assumptions, namely, con-text independent and

context dependent assumptions, which can fit into different application scenarios.
Finally, extensive experiments on a real-world data set show that both approaches
are effective and outperform baselines with respect to mining personal contextaware preferences for mobile users. To this end, in this paper, we propose a novel
approach to mining personal context-aware preferences from context logs of
mobile users. A critical challenge for mining personal context-aware preferences is
that the context log of each individual user usually does not contain sufficient
training information. Therefore, we propose a novel crowd wisdom based approach
for mining the personal context-aware preferences for mobile users, which can
enable the building of personalized context-aware mobile recommender systems.
The contributions of this paper are summarized as follows. First, we propose a
novel approach for mining the personal context-aware preferences for mobile users
through the analysis of context-rich device logs. Specifically, we propose to first
mine common context-aware preferences from the context logs of many users and
then represent the personal context-aware preference of each user as a distribution
of common context-aware preferences. Second, we design two effective methods
for mining common context-aware preferences based on two different assumptions
about context data dependency. If context data are assumed to be conditionally
independent, we propose to mine common context-aware preferences through topic
models. Otherwise, if context data are assumed to be de-pendent, we propose to
exploit the constraint based Matrix Factorization model for mining common
context-aware preferences and only consider those contexts which are relevant to
content usage for reducing the computation complexity. Finally, we evaluate the
proposed approach using a real-world data set with context logs collected from 443
mobile phone users. In total, there are more than 8.8 million context records. The
experimental results clearly demonstrate the effectiveness of the proposed
approach and indicate some inspiring findings. In contrast, the robustness of CIAP
is not good with small numbers of common context-aware preferences but
becomes stable when the setting of the number increases. It may be because that
CDAP leverages associations between contexts and user content categories for
extracting common context-aware preferences and such associations have been
filtered from noisy data. Thus, the quality of mined common context-aware
preferences is always relatively good with different parameters since the mining
are on the basis of pruned training data. In contrast, CIAP leverages ACP-features
for extracting common context-aware preferences, where ACP-features usually
contain more noisy information and thus make the mining results more sensitive to
parameters. Note that, raw locations in context data, such as GPS coordinates or
cell IDs, have been transformed into semantic locations such as Home and
Work Place by some location mining approaches. The basic idea of these
approaches is to find the clusters of user locations and recognize their semantic
meaning by a time pattern analysis. Moreover, we also map the raw usage records
to the usage records of particular categories of contents. In this way, the context
data and usage records in context logs are normalized and the data sparseness
problem is somewhat alleviated. In this paper, we proposed to exploit user context
logs for mining the personal context-aware preferences of mobile users. First, we
identified common context-aware preferences from the context logs of many users.
Then, the personal context-aware preference of an individual user can be
represented as a distribution of common context-aware preferences. Finally, the
experimental results on a real-world data set clearly showed that the proposed
approach could achieve better performances than benchmark methods for mining
personal context-aware preferences, and the one implementation based on the
independent assumption of context data slightly outperforms another one but has
relatively higher computational cost. In this paper, we illustrate how to extract
personal context-aware preferences from the context-rich device logs for building
novel personalized context-aware recommender systems.
Disadvantages:
1.
It is not easy to identify and confirm the evidences linked to ranking fraud.
2.
It is difficult to manually label ranking fraud for each App.
3.
Ranking fraud does not always happen in the whole life cycle of an App.
4.
When an app was promoted with the help of ranking manipulation it could
be top in leaderboard and more new users could be purchased that product.
5.
Affect other app reputation.
PROPOSED SYSTEM
In proposed system, the system propose to develop a ranking fraud detection
system for mobile apps. Ranking fraud does not always happen in the hole life
cycle of an app, so the system need to detect the time when fraud happens. Indeed
our careful observation revels that mobiles apps are not always ranked high in the
leaderboard, but only in some leading events, which form different leading
sessions. Specifically the system first proposed a simple but effective algorithm to
identify the leading sessions of each app based on its historical ranking records.
Ranking fraud in the mobile App market refers to fraudulent or deceptive activities
which have a purpose of bumping up the Apps in the popularity list. Indeed, it
becomes more and more frequent for App developers to use shady means, such as
inflating their Apps' sales or posting phony App ratings, to commit ranking fraud.
While the importance of preventing ranking fraud has been widely recognized,
there is limited understanding and research in this area. To this end, in this paper,
we provide a holistic view of ranking fraud and propose a ranking fraud detection
system for mobile Apps. Specifically, we first propose to accurately locate the
ranking fraud by mining the active periods, namely leading sessions, of mobile
Apps. Such leading sessions can be leveraged for detecting the local anomaly
instead of global anomaly of App rankings. Furthermore, we investigate three types
of evidences, i.e., ranking based evidences, rating based evidences and review
based evidences, by modeling Apps' ranking, rating and review behaviors through
statistical hypotheses tests. In addition, we propose an optimization based
aggregation method to integrate all the evidences for fraud detection. Finally, we
evaluate the proposed system with real-world App data collected from the iOS App
Store for a long time period. In the experiments, we validate the effectiveness of
the proposed system, and show the scalability of the detection algorithm as well as
some regularity of ranking fraud activities. A key step for the mobile app usage
analysis is to classify apps into some predefined categories. However, it is a nontrivial task to effectively classify mobile apps due to the limited contextual
information available for the analysis. To this end, in this paper, we propose an
approach to first en-rich the contextual information of mobile apps by exploiting
the additional Web knowledge from the Web search engine. Then, inspired by the
observation that different types of mobile apps may be relevant to different realworld contexts, we also extract some contextual features for mobile apps from the
context-rich device logs of mobile users. Finally, we combine all the enriched
contextual information into a Maximum Entropy model for training a mobile app
classifier. The experimental results based on 443 mobile users device logs clearly
show that our approach outperforms two state-of-the-art benchmark methods with
a significant margin. To this end, in this paper, we propose to leverage both Web
knowledge and real-world contexts for enriching the contextual information of
apps, thus can improve the performance of mobile app classification. According to
some state-of-the-art works on short text classification, an effective approach to

enriching the original few and sparse textual features is leveraging Web
knowledge. In-spired with those works, we propose to take advantage of a Web
search engine to obtain some snippets to describe a given mobile app for enriching
the textual features of the app. However, sometimes it may be difficult to obtain
sufficient Web knowledge for new or rarely used mobile apps. In this case, the
relevant real-world contexts of mobile apps may be useful. Some observations
from the recent studies indicate that the app usage of a mobile user is usually
context-aware. For example, business apps are likely used under the context like
Location: Work Place, Profile: Meeting, while games are usually played under
the context like Location: Home, Is a holiday: Yes. Com-pared with Web
knowledge, the relevant real-world contexts of new or rarely used mobile apps may
be more available since they can be obtained from the context-rich device logs of
the users who used them in mobile devices. Therefore, we also propose to leverage
the relevant real-world contexts of mobile apps to improve the performance of app
classification. To be specific, the contributions of this paper are summarized as
follows. First, automatic mobile app classification is a novel problem which is still
under development. To the best of our knowledge, we are the first to leverage both
Web knowledge and relevant real-world contexts to enrich the limited con-textual
information of mobile apps for solving this problem. Second, we study and extract
several effective features from both Web knowledge and real-world contexts. Then,
we propose to exploit the Maximum Entropy model (Max-Ent) for combining the
effective features to train a very effective mobile app classifier. Finally, to evaluate
the proposed approach, we conduct extensive experiments on the context-rich
mobile device logs collected from 443 mobile users, which contain 680 unique
mobile apps and more than 8.8 million app usage records. The experimental results
clearly show that our approach outperforms two state-of-the-art benchmark
approaches with a significant margin. The primary objective of this paper is to

define existing challenges in this domain for the different types of large data sets
and streams. It categorizes, compares, and summarizes relevant data mining-based
fraud detection methods and techniques in published academic and industrial
research.
Advantages:
1
Detect fraud ranking in daily app leaderboard.
Avoid ranking manipulation.
The proposed frame-work is scalable and can be extended with other

domain-generated evidences for ranking fraud detection.
The scalability of the detection algorithm as well as some regularity of

ranking fraud activities.
SYSTEM ARCHITECTURE
DATA FLOW DIARGAM

Start
Process selection for user and admin
User app selection and view historical Data
Ranking validation View last rating based Evidence

View User Review validation
User Rating
User Review
Mining Loading Session
End
USE CASE DIARGAM
SEQUENCE DIARGAM
E-R DIAGRAM
E-R DIAGRAM
TESTING OF PRODUCT
Testing of Product:
System testing is the stage of implementation, which aimed at
ensuring that system works accurately and efficiently before the live operation
commence. Testing is the process of executing a program with the intent of finding
an error. A good test case is one that has a high probability of finding an error. A
successful test is one that answers a yet undiscovered error.
Testing is vital to the success of the system. System testing makes a
logical assumption that if all parts of the system are correct, the goal will be
successfully achieved. The candidate system is subject to variety of tests-on-line
response, Volume Street, recovery and security and usability test. A series of tests
are performed before the system is ready for the user acceptance testing. Any
engineered product can be tested in one of the following ways. Knowing the
specified function that a product has been designed to from, test can be conducted
to demonstrate each function is fully operational. Knowing the internal working of
a product, tests can be conducted to ensure that al gears mesh, that is the internal
operation of the product performs according to the specification and all internal
components have been adequately exercised.
UNIT TESTING:
Unit testing is the testing of each module and the integration of the overall
system is done. Unit testing becomes verification efforts on the smallest unit of
software design in the module. This is also known as module testing. The
modules of the system are tested separately. This testing is carried out during the
programming itself. In this testing step, each model is found to be working
satisfactorily as regard to the expected output from the module. There are some
validation checks for the fields. For example, the validation check is done for
verifying the data given by the user where both format and validity of the data
entered is included. It is very easy to find error and debug the system.
INTEGRATION TESTING:
Data can be lost across an interface, one module can have an adverse
effect on the other sub function, when combined, may not produce the desired
major function. Integrated testing is systematic testing that can be done with
sample data. The need for the integrated test is to find the overall system
performance. There are two types of integration testing. They are:
1
Top-down integration testing.
Bottom-up integration testing.
WHITE BOX TESTING:
White Box testing is a test case design method that uses the control
structure of the procedural design to drive cases. Using the white box testing
methods, we derived test cases that guarantee that all independent paths within a
module have been exercised at least once.
BLACK BOX TESTING:
1
Black box testing is done to find incorrect or missing function
Interface error
Errors in external database access
Performance errors
Initialization and termination errors
In functional testing, is performed to validate an application conforms to its

specifications of correctly performs all its required functions. So this testing is also
called black box testing. It tests the external behavior of the system. Here the
engineered product can be tested knowing the specified function that a product has
been designed to perform, tests can be conducted to demonstrate that each function
is fully operational.
VALIDATION TESTING:
After the culmination of black box testing, software is completed
assembly as a package, interfacing errors have been uncovered and corrected and
final series of software validation tests begin validation testing can be defined as
many, but a single definition is that validation succeeds when the software
functions in a manner that can be reasonably expected by the customer.
1
USER ACCEPTANCE TESTING:
User acceptance of the system is the key factor for the success of the
system. The system under consideration is tested for user acceptance by constantly
keeping in touch with prospective system at the time of developing changes
whenever required.
2
OUTPUT TESTING:
After performing the validation testing, the next step is output asking the
user about the format required testing of the proposed system, since no system
could be useful if it does not produce the required output in the specific format.
The output displayed or generated by the system under consideration. Here the
output format is considered in two ways. One is screen and the other is printed
format. The output format on the screen is found to be correct as the format was
designed in the system phase according to the user needs. For the hard copy also
output comes out as the specified requirements by the user. Hence the output
testing does not result in any connection in the system.
System Implementation:
Implementation of software refers to the final installation of the
package in its real environment, to the satisfaction of the intended users and the
operation of the system. The people are not sure that the software is meant to make
their job easier.
1
The active user must be aware of the benefits of using the system
Their confidence in the software built up
Proper guidance is impaired to the user so that he is comfortable in using the

application .
Before going ahead and viewing the system, the user must know that for
viewing the result, the server program should be running in the server. If the server
object is not running on the server, the actual processes will not take place.
User Training:
To achieve the objectives and benefits expected from the proposed system
it is essential for the people who will be involved to be confident of their role in the
new system. As system becomes more complex, the need for education and
training is more and more important.
Education is complementary to training. It brings life to formal training
by explaining the background to the resources for them. Education involves
creating the right atmosphere and motivating user staff. Education information can
make training more interesting and more understandable.
Training on the Application Software:
After providing the necessary basic training on the computer
awareness, the users will have to be trained on the new application software. This
will give the underlying philosophy of the use of the new system such as the screen
flow, screen design, type of help on the screen, type of errors while entering the
data, the corresponding validation check at each entry and the ways to correct the
data entered. This training may be different across different user groups and across
different levels of hierarchy.
Operational Documentation:
Once the implementation plan is decided, it is essential that the user of the
system is made familiar and comfortable with the environment. A documentation
providing the whole operations of the system is being developed. Useful tips and
guidance is given inside the application itself to the user. The system is developed
user friendly so that the user can work the system from the tips given in the
application itself.
System Maintenance:
The maintenance phase of the software cycle is the time in which
software performs useful work. After a system is successfully implemented, it
should be maintained in a proper manner. System maintenance is an important
aspect in the software development life cycle. The need for system maintenance is
to make adaptable to the changes in the system environment. There may be social,
technical and other environmental changes, which affect a system which is being
implemented. Software product enhancements may involve providing new
functional capabilities, improving user displays and mode of interaction, upgrading
the performance characteristics of the system. So only thru proper system
maintenance procedures, the system can be adapted to cope up with these changes.
Software maintenance is of course, far more than finding mistakes.
Corrective Maintenance:
The first maintenance activity occurs because it is unreasonable to
assume that software testing will uncover all latent errors in a large software
system. During the use of any large program, errors will occur and be reported
to the developer. The process that includes the diagnosis and correction of one or
more errors is called Corrective Maintenance.
Adaptive Maintenance:
The second activity that contributes to a definition of maintenance
occurs because of the rapid change that is encountered in every aspect of
computing. Therefore Adaptive maintenance termed as an activity that modifies
software to properly interfere with a changing environment is both necessary and
commonplace.
Perceptive Maintenance:
The third activity that may be applied to a definition of maintenance
occurs when a software package is successful. As the software is used,
recommendations for new capabilities, modifications to existing functions, and
general enhancement are received from users. To satisfy requests in this category,
Perceptive maintenance is performed. This activity accounts for the majority of all
efforts expended on software maintenance.
Preventive Maintenance:
The fourth maintenance activity occurs when software is changed to
improve future maintainability or reliability, or to provide a better basis for future
enhancements. Often called preventive maintenance, this activity is characterized

by reverse engineering and re-engineering techniques.
MODULES:
1
App Process Selection
Rating Based Evidences
Review Based Evidences
Admin Analyze Rank Evidence Aggregation
MODULE DESCRIPTION
App Process Selection:

In App selection module we choose the app process role , if the entry user has
want to do the app view , giving rating or review otherwise select best app for
using their personal use. If the entry has come to do the admin process they should
give their authentication for further proceed, but not this authentication checking in
user gateway entry. the system process has follow the which person come to do
the activity in our project and what they done, in the mining session keep and
record all those information, if the action has commit the valid information in this
process.
Rating Based Evidence:

The
Rating
based
evidence
model
has
useful
for
ranking
fraud
detection. We also study how to extract fraud evidences from Apps historical
rating records. Specifically, after an App has been published, it can be
rated by any user who downloaded it. Indeed, user rating is one of the most
important features of App advertisement. An App which has higher rating may
attract more users to download and can also be ranked higher in the leaderboard.
Thus, rating manipulation is also an important perspective of ranking fraud.
Intuitively, if an App has ranking fraud in a leading session s, the ratings during
the time period may have anomaly patterns compared with its historical ratings,
which can be used for constructing rating based evidences. In our project verify the
user id for which is come to give the rating for particular app.
Review Based Evidence:

Review manipulation is one of the most important perspectives of App ranking
fraud a mobile App contains more positive reviews may attract more users to
download. Therefore, imposters often post fake reviews in the leading sessions of a
specific App in order to inflate the App download, and thus propel the Apps
ranking position in the leaderboard some previous works on review spam detection
have been reported in recent years. the problem of detecting the local
anomaly
of
reviews
in
the
leading
sessions
and
capturing
them as evidences for ranking fraud detection are still under-explored. To this end,
here we propose two fraud evidences based on Apps review behaviors in leading
sessions for detecting ranking fraud.
Admin Analyze Rank Evidence Aggregation:
That Apps ranking behaviors in a leading event always satisfy a specific ranking
pattern, which consists of three different ranking phases, namely, rising phase,
maintaining phase and recession phase. Specifically, in each leading event, an
Apps ranking first increases to a peak position in the leaderboard then keeps such
peak position for a period, and finally decreases till the end of the event.in this
phase the admin has
taken
the rating and review aggregation for find the
fraudulent app. Based on this leading session admin has given rank for each
Apps. There are many ranking and evidence aggregation methods in the literature,
such as permutation based models score based models and Dumpster Shafer rules.
However, some of these methods focus on learning a global ranking for all
candidates. This is not proper for detecting ranking fraud for new Apps. Other
methods are based on supervised learning techniques, which depend on the labeled
training data and are hard to be exploited. Instead, we propose an unsupervised
approach based on fraud similarity to combine these evidences.
Tool Description
POST TAGGER (Part Of Speech Tagging)
In part of speech tagging algorithms, two main sources of information are used to
compute the probability that a specific tag is correct: the probability of a specific
tag for a specific word and the relative probability of the current sequence of tags
in English. To combine these two probabilities and determine the best overall tag
for a given word, many statisticians use Hidden Markov Models (HMMs).
In understanding how an HMM works, first we must examine how it would work if
it only took into account the probability that a specific tag occurs with a specific
word. This process works in ways very similar to n-grams (which are actually
instances of Markov Models) in that it makes the bigram assumption the words tag
can be determined simply based on the previous words tag. Given that the model
has been trained on a tagged corpus, which already has part of speech information
added, it can calculate the probabilities that specific words serve as nouns, verbs,
or other parts of speech. Additionally, it can calculate the probability that one part
of speech occurs after another part of speech. The models task can best be
understood by using an example, such as to flower. We assume that we know the
proper tag for to and that we know flower can be a noun or a verb. Then we
seek to maximize the product of the probability that the tag for flower follows the
tag for to and the probability that, given we are expecting a certain tag, flower is
the word matching this tag. Expressed mathematically, we compare the values of
P(VERB | TO) P(flower | VERB) and P(NOUN | TO)P(flower | NOUN), as TO
is the correct tag for to. This is the task that the model would face if it were
seeking to tag an individual word when given the preceding words tag.
In actuality, HMMs usually try to tag whole sentences at one time, and they are not
given any tags that are certain. Thus, there are many more probabilities and
comparisons involved. To limit these computations and create a process that is
manageable given time and computing constraints, HMMs are usually
programmed to make some n-gram assumption, often a trigram assumption.

Consequently, long sentences can still be tagged without taking enormous amounts
of time, and assumptions are made that allow for linear tagging time rather than
exponential tagging time based on the length of the text to be tagged. Using the
techniques described, researchers have reported accuracy rates that are greater than
96%. Current research aims to increase these accuracy rates by taking more factors
than simply the previous (n 1) tags into account when calculating the next tag.
SentiWord:
SentiWord is a lexical resource for opinion mining. SentiWord assigns to each
synset of Word Net three sentiment scores: positivity, negativity, objectivity.
SentiWord is described in details in the papers
The SentiWord library contains several general purpose functions for performing a
binary search and modifying sorted files.
bin_search() is the primary binary search algorithm to search for key as the first
item on a line in the file pointed to by fp . The delimiter between the key and the
rest of the fields on the line, if any, must be a space. A pointer to a static variable
containing the entire line is returned. NULL is returned if a match is not found.
The remaining functions are not used by SentiWord, and are only briefly described.
copyfile() copies the contents of one file to another.
replace_line() replaces a line in a file having searchkey key with the contents of
new_line . It returns the original line or NULL in case of error.
insert_line() finds the proper place to insert the contents of new_line , having
searchkey key in the sorted file pointed to by fp . It returns NULL if a line with this
searchkey is already in the file.
Algorithm & Techniques

A phonetic algorithm is use for indexing of words by their pronunciation. Most
phonetic algorithms were developed for use with the English language;
consequently, applying the rules to words in other languages might not give a
meaningful result.
They are necessarily complex algorithms with many rules and exceptions, because
English spelling and pronunciation is complicated by historical changes in
pronunciation and words borrowed from many languages.
SentiWord has been used for a number of different purposes in information
systems, including word sense disambiguation, information retrieval, automatic
text classification, automatic text summarization, machine translation and even

automatic crossword puzzle generation.
Term frequency
Variants of TF weight
weighting scheme
binary
{0,1}
raw frequency
f_{t,d}
Log normalization
1 + \log (f_{t,d})
Double normalization
0.5
0.5 + 0.5 \frac { f_{t,d} }{\max
{f_{t,d}}}
Double normalization K
K + (1 - K) \frac { f_{t,d} }{\max
{f_{t,d}}}
1. Achieving Guaranteed Anonymity in GPS Traces via Uncertainty - Aware

Path Cloaking
Paper Description:
The integration of Global Positioning System (GPS) receivers and sensors
into mobile devices has enabled collaborative sensing applications, which monitor
the dynamics of environments through opportunistic collection of data from many
users devices. One example that motivates this paper is a probe-vehicle-based
automotive traffic monitoring system, which estimates traffic congestion from GPS
velocity measurements reported from many drivers. This paper considers the
problem of achieving guaranteed anonymity in a locational data set that includes
location traces from many users, while maintaining high data accuracy. We
consider two methods to identify anonymous location traces, target tracking, and
home identification, and observe that known privacy algorithms cannot achieve
high application accuracy requirements or fail to provide privacy guarantees for
drivers in low-density areas. To overcome these challenges, we derive a novel
time-to-confusion criterion to characterize privacy in a locational data set and
propose a disclosure control algorithm (called uncertainty-aware path cloaking
algorithm) that selectively reveals GPS samples to limit the maximum time-to
confusionfor all vehicles. Through trace-driven simulations using real GPS traces
from 312 vehicles, we demonstrate that this algorithm effectively limits tracking
risks, in particular, by eliminating tracking outliers. It also achieves significant data
accuracy improvements compared to known algorithms. We then present two
enhancements to the algorithm. First, it also addresses the home identification risk
by reducing location information revealed at the start and end of trips. Second, it
also considers heading information reported by users in the tracking model. This
version can thus protect users who are moving in dense areas but in a different
direction from the majority.
Advantage:
Data accuracy
Disadvantage:
It is not sufficient to protect privacy
2. Mining Mobile User Preferences for Personalized Context - Aware

Recommendation
Paper Description:
Recent advances in mobile devices and their sensing capabilities have
enabled the collection of rich contextual information and mobile device usage
records through the device logs. These context-rich logs open a venue for
mining the personal preferences of mobile users under varying contexts and thus
enabling the development of personalized context-aware recommendation and
other related services, such as mobile online advertising. In this article, we
illustrate how to extract personal context-aware preferences from the context-rich
device logs, or context logs for short, and exploit these identified preferences for
building personalized context aware recommender systems. A critical challenge

along this line is that the context log of each individual user may not contain
sufficient data for mining his or her context-aware preferences. Therefore, we
propose to first learn common context-aware preferences from the context logs of
many users. Then, the preference of each user can be represented as a distribution
of these common context-aware preferences. Specifically, we develop two
approaches for mining common context-aware preferences based on two different
assumptions, namely, context-independent and context-dependent assumptions,
which can fit into different application scenarios. Finally, extensive experiments on
a real-world dataset show that both approaches are effective and outperform
baselines with respect to mining personal context-aware preferences for mobile
users.
Advantage :
It can take advantage of topic models to learn common context-aware preferences
from many users context logs.
Disadvantage :
Each individual user may not contain sufficient data for mining his or her contextaware preferences.
3. Ranking Metric Anomaly in Invariant Networks

Paper Description:
The management of large-scale distributed information systems relies on the
effective use and modeling of monitoring data collected at various points in the
distributed information systems. A traditional approach to model monitoring data is

to discover invariant relationships among the monitoring data. Indeed, we can
discover all invariant relationships among all pairs of monitoring data and generate
invariant networks, where a node is a monitoring data source (metric) and a link
indicates an invariant relationship between two monitoring data. Such an invariant
network representation can help system experts to localize and diagnose the system
faults by examining those broken invariant relationships and their related metrics,
since system faults usually propagate among the monitoring data and eventually
lead to some broken invariant relationships. However, at one time, there are usually
a lot of broken links (invariant relationships) within an invariant network. Without
proper guidance, it is difficult for system experts to manually inspect this large
number of broken links. To this end, in this article, we propose the problem of
ranking metrics according to the anomaly levels for a given invariant network,
while this is a nontrivial task due to the uncertainties and the complex nature of
invariant networks. Specifically, we propose two types of algorithms for ranking
metric anomaly by link analysis in invariant networks. Along this line, we first
define two measurements to quantify the anomaly level of each metric, and
introduce the mRank algorithm. Also, we provide a weighted score mechanism and
develop the gRank algorithm, which involves an iterative process to obtain a score
to measure the anomaly levels. In addition, some extended algorithms based on
mRank and gRank algorithms are developed by taking into account the probability
of being broken as well as noisy links. Finally, we validate all the proposed
algorithms on a large number of real-world and synthetic data sets to illustrate the
effectiveness and efficiency of different algorithms.
Advantage:
It is a simple and efficient
Disadvantage :
It may not effectively follow the broken links to detect the fault with so
many broken links.
4. Enhancing Collaborative Filtering by User Interest Expansion via

Personalized Ranking
Paper Description:
Recommender systems suggest a few items from many possible
choices to the users by understanding their past behaviors. In these systems, the
user behaviors are influenced by the hidden interests of the users. Learning to
leverage the information about user interests is often critical for making better
recommendations. However, existing collaborative-filtering-based recommender
systems are usually focused on exploiting the information about the users
interaction with the systems; the information about latent user interests is largely
underexplored. To that end, inspired by the topic models, in this paper, we propose
a novel collaborative-filtering-based recommender system by user interest
expansion via personalized ranking, named iExpand. The goal is to build an itemoriented model-based collaborativefiltering framework. The iExpand method
introduces a threelayer, userinterestsitem, representation scheme, which leads to
more accurate ranking recommendation results with less computation cost and
helps the understanding of the interactions among users, items, and user interests.
Moreover, iExpand strategically deals with many issues that exist in traditional
collaborativefiltering approaches, such as the overspecialization problem and
the cold-start problem. Finally, we evaluate iExpand on three benchmark data sets,
and experimental results show that iExpand can lead to better ranking performance
than state-of-the-art methods with a significant margin.
Advantage:
The interest expansion is more proper to capture the diversified interests and
find potential interests for the users.
Disadvantage:
It does not perform well.
5. Mobile App Classification with Enriched Contextual Information
Paper Description:
The study of the use of mobile Apps plays an important role in
understanding the user preferences, and thus provides the opportunities for
intelligent personalized context-based services. A key step for the mobile App
usage analysis is to classify Apps into some predefined categories. However, it is a
nontrivial task to effectively classify mobile Apps due to the limited contextual
information available for the analysis. For instance, there is limited contextual
information about mobile Apps in their names. However, this contextual
information is usually incomplete and ambiguous. To this end, in this paper, we
propose an approach for first enriching the contextual information of mobile Apps
by exploiting the additional Web knowledge from the Web search engine. Then,
inspired by the observation that different types of mobile Apps may be relevant to
different real-world contexts, we also extract some contextual features for mobile
Apps from the context-rich device logs of mobile users. Finally, we combine all the
enriched contextual information into the Maximum Entropy model for training a
mobile App classifier. To validate the proposed method, we conduct extensive
experiments on 443 mobile users device logs to show both the effectiveness and
efficiency of the proposed approach. The experimental results clearly show that our
approach outperforms two state-of-the-art benchmark methods with a significant

margin.
Advantages :
It is robust and successfully applied to a wide range of NLP tasks.
Disadvantage:
It is difficult to train an effective classifier by only taking advantage of
the words in App names.
6. Latent Dirichlet Allocation
Paper Description:
We describe latent Dirichlet allocation (LDA), a generative
probabilistic model for collections of discrete data such as text corpora. LDA is a
three-level hierarchical Bayesian model, in which each item of a collection is
modeled as a finite mixture over an underlying set of topics. Each topic is, in
turn, modeled as an infinite mixture over an underlying set of topic probabilities. In
the context of text modeling, the topic probabilities provide an explicit
representation of a document. We present efficient approximate inference
techniques based on variational methods and an EM algorithm for empirical Bayes
parameter estimation. We report results in document modeling, text classification,
and collaborative filtering, comparing to a mixture of unigrams model and the
probabilistic LSI model.
Advantages:
It provides well-defined inference procedures for previously unseen documents.
Disadvantage:
It cannot be computed tractably.
7. A Taxi Driving Fraud Detection System Paper Description:

Paper Description:
Advances in GPS tracking technology have enabled us to install GPS
tracking devices in city taxis to collect a large amount of GPS traces under
operational time constraints. These GPS traces provide unparallel opportunities
for us to uncover taxi driving fraud activities. In this paper, we develop a taxi
driving fraud detection system, which is able to systematically investigate taxi
driving fraud. In this system, we first provide functions to find two aspects of
evidences: travel route evidence and driving distance evidence. Furthermore,
a third function is designed to combine the two aspects of evidences based on
Dempster-Shafer theory. To implement the system, we first identify int e r e s t ing
s it e s from a large amount of taxi GPS logs. Then, we propose a parameter-free
method to mine the travel route evidences. Also, we introduce routemark to
represent a typical driving path from an interesting site to another one. Based on
routemark, we exploit a generative statistical model to characterize the distribution
of driving distance and identify the driving distance evidences. Finally, we evaluate
the taxi driving fraud detection system with large scale real-world taxi GPS logs. In
the experiments, we uncover some regularity of driving fraud activities and
investigate the motivation of drivers to commit a driving fraud by analyzing the

produced taxi fraud data.
Advantage:
Reduces traffic costs, many systems try to mask the identities of their users for
privacy considerations.
Disadvantage:
It is broadly used to increase resolution in the signal processing area, to
insert pseudo recorded points.
SYSTEM REQUIREMENTS
Software Requirements
1
O/S
: Windows XP.
Language
: Java.
IDE
: Net Beans 6.9.1
Data Base
: MySQL
Hardware Requirements
1
System
Hard Disk
: Pentium IV 2.4 GHz

: 160 GB
Monitor
: 15 VGA color
Mouse
: Logitech.
Keyboard
: 110 keys enhanced
Ram
: 2GB
SOFTWARE DESCRIPTION
Java
Java is a programming language originally developed by James Gosling at
Sun Microsystems (now a subsidiary of Oracle Corporation) and released in 1995
as a core component of Sun Microsystems' Java platform. The language derives
much of its syntax from C and C++ but has a simpler object model and fewer lowlevel facilities. Java applications are typically compiled to byte code (class file)
that can run on any Java Virtual Machine (JVM) regardless of computer
architecture. Java is a general-purpose, concurrent, class-based, object-oriented
language that is specifically designed to have as few implementation dependencies
as possible. It is intended to let application developers "write once, run anywhere."
Java is currently one of the most popular programming languages in use,
particularly for client-server web applications.
Java Platform:
One characteristic of Java is portability, which means that computer

programs
written
in
the
Java
language
must
run
similarly
on
any
hardware/operating-system platform. This is achieved by compiling the Java

language code to an intermediate representation called Java byte code, instead of
directly to platform-specific machine code. Java byte code instructions are
analogous to machine code, but are intended to be interpreted by a virtual machine
(VM) written specifically for the host hardware.
End-users commonly use a Java Runtime Environment (JRE) installed on
their own machine for standalone Java applications, or in a Web browser for Java
applets. Standardized libraries provide a generic way to access host-specific
features such as graphics, threading, and networking.
A major benefit of using byte code is porting. However, the overhead of
interpretation means that interpreted programs almost always run more slowly than
programs compiled to native executables would. Just-in-Time compilers were
introduced from an early stage that compiles byte codes to machine code during
runtime.
Just as application servers such as Glass Fish provide lifecycle services to
web applications, the Net Beans runtime container provides them to Swing
applications. All new shortcuts should be registered in "Key maps/Net Beans"
folder. Shortcuts installed INS Shortcuts folder will be added to all key maps, if
there is no conflict. It means that if the same shortcut is mapped to different actions
in Shortcut folder and current key map folder (like Key map/Net Beans), the
Shortcuts folder mapping will be ignored.
* Database Explorer Layer API in Database Explorer
* Loaders-text-db schema-Actions in Database Explorer
* Loaders-text-sql-Actions in Database Explorer

* Plug-in Registration in Java EE Server Registry
The keyword public denotes that a method can be called from code in other
classes, or that a class may be used by classes outside the class hierarchy. The class
hierarchy is related to the name of the directory in which the .java file is located.
The keyword static in front of a method indicates a static method, which is
associated only with the class and not with any specific instance of that class. Only
static methods can be invoked without a reference to an object. Static methods
cannot access any class members that are not also static. The keyword void
indicates that the main method does not return any value to the caller. If a Java
program is to exit with an error code, it must call System. Exit () explicitly.
The method name "main" is not a keyword in the Java language. It is simply
the name of the method the Java launcher calls to pass control to the program. Java
classes that run in managed environments such as applets and Enterprise
JavaBeans do not use or need a main () method. A Java program may contain
multiple classes that have main methods, which means that the VM needs to be
explicitly told which class to launch from.
The Java launcher launches Java by loading a given class (specified on the
command line or as an attribute in a JAR) and starting its public static void
main(String[]) method. Stand-alone programs must declare this method explicitly.
The String [] args parameter is an array of String objects containing any arguments
passed to the class. The parameters to main are often passed by means of a
command line.
Java a High-level Language:
A high-level programming language developed by Sun Microsystems. Java

was originally called OAK, and was designed for handheld devices and set-top
boxes. Oak was unsuccessful so in 1995 Sun changed the name to Java and
modified the language to take advantage of the burgeoning World Wide Web.
Java source code files (files with a .java extension) are compiled into a
format called byte code (files with a .class extension), which can then be executed
by a Java interpreter. Compiled Java code can run on most computers because Java
interpreters and runtime environments, known as Java Virtual Machines (VMs).
Byte code can also be converted directly into machine language instructions by a
just-in-time compiler (JIT).
Java is a general purpose programming language with a number of features
that make the language well suited for use on the World Wide Web. Small Java
applications are called Java applets and can be downloaded from a Web server and
run on your computer by a Java-compatible Web browser, such as Netscape
Navigator or Microsoft Internet Explorer.
Object-Oriented Software Development using Java: Principles, Patterns, and
Frameworks contain a much applied focus that develops skills in designing
software-particularly in writing well-designed, medium-sized object-oriented
programs. It provides a broad and coherent coverage of object-oriented technology,
including object-oriented modeling using the Unified Modeling Language (UML)
object-oriented design using Design Patterns, and object-oriented programming
using Java.
Net Beans
The Net
Beans
Platform is
reusable framework for
simplifying the development of Java Swing desktop applications.
The Net Beans IDE bundle for Java SE contains what is needed to
start developing Net Beans plug-in and Net Beans Platform based
applications; no additional SDK is required.
Applications can install modules dynamically. Any application
can include the Update Center module to allow users of the
application
to
download digitally-signed upgrades
and
new
features directly into the running application.

The platform offers reusable services common to desktop
applications, allowing developers to focus on the logic specific to
their application. Among the features of the platform are:
1 User interface management (e.g. menus and toolbars)
2 User settings management
3 Storage management (saving and loading any kind of data)
4 Window management
5 Wizard framework (supports step-by-step dialogs)
6 Net Beans Visual Library
7 Integrated Development Tools
J2EE
A Java EE application or a Java Platform, Enterprise Edition
application is any deployable unit of Java EE functionality. This can be a single
Java EE module or a group of modules packaged into an EAR file along with a
Java EE application deployment descriptor.
Enterprise applications can consist of the following:

1
EJB modules (packaged in JAR files);
Web modules (packaged in WAR files);
connector modules or resource adapters (packaged in RAR files);
Session Initiation Protocol (SIP) modules (packaged in SAR files);
application client modules
Additional JAR files containing dependent classes or other components

required by the application;
Wamp Server
WAMPs are packages of independently-created programs
installed on computers that use a Microsoft Windows operating
system.
Apache is a web server. MySQL is an open-source database.
PHP is a scripting language that can manipulate information held
in a database and generate web pages dynamically each time
content is requested by a browser. Other programs may also be
included in a package, such as phpMyAdmin which provides a
graphical user interface for the MySQL database manager, or the
alternative scripting languages Python or Perl.
MySQL
The MySQL development project has made its source code

available under the terms of the GNU General Public License, as
well as under a variety of proprietary agreements. MySQL was
owned and sponsored by a single for-profit firm, the Swedish
company MySQL AB, now owned by Oracle Corporation.
Free-software-open source projects that require a fullfeatured
database
management
system
often
use
MySQL.
Applications which use MySQL databases include: TYPO3, Joomla,

WordPress, phpBB, Drupal and other software built on the LAMP
software stack.
Platforms and interfaces
Many programming languages with language-specific APIs
include libraries for accessing MySQL databases. These include
MySQL Connector/Net for integration with Microsoft's Visual Studio
(languages such as C# and VB are most commonly used) and the
JDBC driver for Java. In addition, an ODBC interface called
MyODBC allows additional programming languages that support
the ODBC interface to communicate with a MySQL database, such
as ASP or ColdFusion. The MySQL server and official libraries are
mostly implemented in ANSI C/ANSI C++.
FEASIBILITY STUDY
The feasibility study is carried out to test whether the proposed system is
worth being implemented. The proposed system will be selected if it is best enough
in meeting the performance requirements.
The feasibility carried out mainly in three sections namely.
Economic Feasibility
Technical Feasibility
Behavioral Feasibility
Economic Feasibility
Economic analysis is the most frequently used method for evaluating
effectiveness of the proposed system. More commonly known as cost benefit
analysis. This procedure determines the benefits and saving that are expected from
the system of the proposed system. The hardware in system department if
sufficient for system development.
Technical Feasibility
This study center around the systems department hardware, software and to
what extend it can support the proposed system department is having the required
hardware and software there is no question of increasing the cost of implementing
the proposed system.
The criteria, the proposed system is technically feasible
and the proposed system can be developed with the existing facility.
Behavioral Feasibility
People are inherently resistant to change and need sufficient amount of
training, which would result in lot of expenditure for the organization. The
proposed system can generate reports with day-to-day information immediately at
the users request, instead of getting a report, which doesnt contain much detail.
System Implementation
Implementation of software refers to the final installation of the
package in its real environment, to the satisfaction of the intended users and the
operation of the system. The people are not sure that the software is meant to make
their job easier.
1
The active user must be aware of the benefits of using the system
Their confidence in the software built up
Proper guidance is impaired to the user so that he is comfortable in using the

application
Before going ahead and viewing the system, the user must know that for
viewing the result, the server program should be running in the server. If the server
object is not running on the server, the actual processes will not take place.
User Training
To achieve the objectives and benefits expected from the proposed system it
is essential for the people who will be involved to be confident of their role in the
new system. As system becomes more complex, the need for education and
training is more and more important. Education is complementary to training. It
brings life to formal training by explaining the background to the resources for
them. Education involves creating the right atmosphere and motivating user staff.
Education
information
can
make
training
more
interesting
and
more
understandable.
Training on the Application Software

After providing the necessary basic training on the computer
awareness, the users will have to be trained on the new application software. This
will give the underlying philosophy of the use of the new system such as the screen
flow, screen design, type of help on the screen, type of errors while entering the
data, the corresponding validation check at each entry and the ways to correct the
data entered. This training may be different across different user groups and across
different levels of hierarchy.
Operational Documentation
Once the implementation plan is decided, it is essential that the user of the
system is made familiar and comfortable with the environment. A documentation
providing the whole operations of the system is being developed. Useful tips and
guidance is given inside the application itself to the user. The system is developed
user friendly so that the user can work the system from the tips given in the
application itself.
System Maintenance
The maintenance phase of the software cycle is the time in which software
performs useful work. After a system is successfully implemented, it should be
maintained in a proper manner. System maintenance is an important aspect in the
software development life cycle. The need for system maintenance is to make
adaptable to the changes in the system environment. There may be social, technical
and other environmental changes, which affect a system which is being
implemented. Software product enhancements may involve providing new
functional capabilities, improving user displays and mode of interaction, upgrading
the performance characteristics of the system. So only thru proper system
maintenance procedures, the system can be adapted to cope up with these changes.
Software maintenance is of course, far more than finding mistakes.
Corrective Maintenance
The first maintenance activity occurs because it is unreasonable to assume
that software testing will uncover all latent errors in a large software system.
During the use of any large program, errors will occur and be reported to the
developer. The process that includes the diagnosis and correction of one or more
errors is called Corrective Maintenance.
Adaptive Maintenance
The second activity that contributes to a definition of maintenance occurs

because of the rapid change that is encountered in every aspect of computing.
Therefore Adaptive maintenance termed as an activity that modifies software to
properly interfere with a changing environment is both necessary and
commonplace.
Perceptive Maintenance
The third activity that may be applied to a definition of maintenance occurs
when a software package is successful. As the software is used, recommendations
for new capabilities, modifications to existing functions, and general enhancement
are received from users. To satisfy requests in this category, Perceptive
maintenance is performed. This activity accounts for the majority of all efforts
expended on software maintenance.
Preventive Maintenance
The fourth maintenance activity occurs when software is changed to improve
future maintainability or reliability, or to provide a better basis for future
enhancements. Often called preventive maintenance, this activity is characterized
by reverse engineering and re-engineering techniques
CONCLUSION
The system developed a ranking fraud detection sys-tem for mobile Apps.
Specifically, we first showed that ranking fraud happened in leading sessions and
provided a method for mining leading sessions for each App from its historical
ranking records. This survey has explored almost all published fraud detection
studies. It defines the adversary, the types and subtypes of fraud, the technical
nature of data, performance metrics, and the methods and techniques. After
identifying the limitations in methods and techniques of fraud detection, this paper
shows that this field can benefit from other related fields.
Future Enhancement
In this System Specifically, unsupervised approaches from counterterrorism
work, actual monitoring systems and text mining from law enforcement, and semisupervised and game-theoretic approaches from intrusion and spam detection
communities can contribute to future fraud detection research. However, the
system show that there are no guarantees when they successfully applied their
fraud detection method to news story monitoring but unsuccessfully to intrusion
detection. Future work will be in the form of credit application fraud detection.
REFERENCES
[1] (2014). [Online]. Available: h ttp://en.wikipe dia.or g/wiki/ cohens_kappa
[2]
(2014).
[Online].
Available:
ttp://en.wikipe
dia.or
g/wiki/
information_retrieval
[3] (2012). [Online]. Available: https://developer.apple.com/news/ index.php?
id=02062012a
[4] (2012). [Online]. Available: http://venturebeat.com/2012/07/03/ applescrackdown-on-app-ranking-manipulation/
[5]
(2012).
[Online].
Available:
crackdown-biggest-app-store-ranking-fra
http://www.ibtimes.com/apple-threatensud-406764
[6]
(2012).
[Online].
Available: http://www.lextek.com/manuals/ onix/index.html

[7] (2012). [Online]. Available: http: //w ww. lin g. gu. se/lager/ mogul/porterstemmer.
[8] L. Azzopardi, M. Girolami, and K. V. Risjbergen, Investigating the
relationship between language model perplexity and ir preci-sion-recall measures,
inProc. 26th Int. Conf. Res. Develop. Inform. Retrieval, 2003, pp. 369370.
[9] D. M. Blei, A. Y. Ng, and M. I. Jordan, Latent Dirichlet allocation,J. Mach.
Learn. Res., pp. 9931022, 2003.
[10] Y. Ge, H. Xiong, C. Liu, and Z.-H. Zhou, A taxi driving fraud detection
system, in Proc. IEEE 11th Int. Conf. Data Mining , 2011, pp. 181190.
[11] D. F. Gleich and L.-h. Lim, Rank aggregation via nuclear norm
minimization, in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Dis-covery Data
Mining, 2011, pp. 6068.
[12] T. L. Griffiths and M. Steyvers, Finding scientific topics, Proc. Nat. Acad.
Sci. USA, vol. 101, pp. 52285235, 2004.
[13] G. Heinrich, Parameter estimation for text analysis, Univ. Leipzig, Leipzig,
Germany, Tech. Rep., http://faculty.cs.byu.edu/~ring-ger/CS601R/papers/HeinrichGibbsLDA.pdf, 2008.
[14] N. Jindal and B. Liu, Opinion spam and analysis, in Proc. Int. Conf. Web
Search Data Mining , 2008, pp. 219230.
[15] J. Kivinen and M. K. Warmuth, Additive versus exponentiated gradient
updates for linear prediction, in Proc. 27th Annu. ACM Symp. Theory Comput.,
1995, pp. 209218.
[16] A. Klementiev, D. Roth, and K. Small, An unsupervised learning algorithm
for rank aggregation, in Proc. 18th Eur. Conf. Mach. Learn. , 2007, pp. 616623.
[17] A. Klementiev, D. Roth, and K. Small, Unsupervised rank aggre-gation with
distance-based models, in Proc. 25th Int. Conf. Mach. Learn. , 2008, pp. 472479.
[18] A. Klementiev, D. Roth, K. Small, and I. Titov, Unsupervised rank
aggregation with domain-specific expertise, in Proc. 21 st Int. Joint Conf. Artif.
Intell., 2009, pp. 11011106.
[19] E.-P. Lim, V.-A. Nguyen, N. Jindal, B. Liu, and H. W. Lauw, Detecting
product review spammers using rating behaviors, in Proc. 19th ACM Int. Conf.
Inform. Knowl. Manage. , 2010, pp. 939948.
[20] Y.-T. Liu, T.-Y. Liu, T. Qin, Z.-M. Ma, and H. Li, Supervised rank
aggregation, in Proc. 16th Int. Conf. World Wide Web , 2007, pp. 481490.
[21] A. Mukherjee, A. Kumar, B. Liu, J. Wang, M. Hsu, M. Castellanos, and R.
Ghosh, Spotting opinion spammers using behavioral foot-prints, in Proc. 19th
ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, 2013, pp. 632640.
[22] A. Ntoulas, M. Najork, M. Manasse, and D. Fetterly, Detecting spam web
pages through content analysis, in Proc. 15th Int. Conf. World Wide Web, 2006,
pp. 8392.
[23] G. Shafer, A Mathematical Theory of Evidence . Princeton, NJ, USA:
Princeton Univ. Press, 1976.
[24] K. Shi and K. Ali, Getjar mobile application recommendations with very
sparse datasets, inProc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining , 2012, pp. 204212.
[25] N. Spirin and J. Han, Survey on web spam detection: Principles and
algorithms, SIGKDD Explor. Newslett., vol. 13, no. 2, pp. 50 64, May 2012.
[26] M. N. Volkovs and R. S. Zemel, A flexible generative model for preference
aggregation, in Proc. 21st Int. Conf. World Wide Web, 2012, pp. 479488.
[27] Z. Wu, J. W u, J. Cao, and D. Tao, HySAD: A semi-supervised hybrid
shilling attack detector for trustworthy product recom-mendation, in Proc. 18th
ACM SIGKDD Int. Conf. Knowl. Discov-ery Data Mining , 2012, pp. 985993.
[28] S. Xie, G. Wang, S. Lin, and P. S. Yu, Review spam detection via temporal
pattern discovery, in Proc. 18th ACM SIGKDD Int. Conf. Knowl. Discovery Data
Mining , 2012, pp. 823831.
[29] B. Yan and G. Chen, AppJoy: Personalized mobile application discovery, in

Proc. 9th Int. Conf. Mobile Syst., Appl., Serv., 2011, pp. 113126.
[30] B. Zhou, J. Pei, and Z. Tang, A spamicity approach to web spam detection,
in Proc. SIAM Int. Conf. Data Mining, 2008, pp. 277288.
[31] H. Zhu, H. Cao, E. Chen, H. Xiong, and J. Tian, Exploiting enriched
contextual information for mobile app classification, in Proc. 21 st ACM Int. Conf.
Inform. Knowl. Manage., 2012, pp. 16171621.
[32] H. Zhu, E. Chen, K. Yu, H. Cao, H. Xiong, and J. Tian, Mining personal
context-aware preferences for mobile users, in Proc. IEEE 12th Int. Conf. Data
Mining , 2012, pp. 12121217.
[33] H. Zhu, H. Xiong, Y. Ge, and E. Chen, Ranking fraud detection for mobile
apps: A holistic view, in Proc. 22nd ACM Int. Conf. Inform. Knowl. Manage. ,
2013, pp. 619628.

FraudRank Document

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FraudRank Document

Uploaded by

Copyright:

Available Formats

TABLE OF CONTENT

DISCOVERY OF RANKING FRAUD FOR MOBILE APPS

hypotheses tests. It proposes an optimization based aggregation method. And Find

preferences based on two different assumptions, namely, con-text independent and

It is difficult to manually label ranking fraud for each App.

Affect other app reputation.

some state-of-the-art works on short text classification, an effective approach to

approaches with a significant margin. The primary objective of this paper is to

Detect fraud ranking in daily app leaderboard.

Avoid ranking manipulation.

The proposed frame-work is scalable and can be extended with other

The scalability of the detection algorithm as well as some regularity of

DATA FLOW DIARGAM

Process selection for user and admin

User app selection and view historical Data

Ranking validation View last rating based Evidence

Mining Loading Session

USE CASE DIARGAM

Top-down integration testing.

Bottom-up integration testing.

WHITE BOX TESTING:

Black box testing is done to find incorrect or missing function

Errors in external database access

Initialization and termination errors

In functional testing, is performed to validate an application conforms to its

USER ACCEPTANCE TESTING:

Their confidence in the software built up

Proper guidance is impaired to the user so that he is comfortable in using the

enhancements. Often called preventive maintenance, this activity is characterized

App Process Selection

Rating Based Evidences

Review Based Evidences

Admin Analyze Rank Evidence Aggregation

App Process Selection:

Rating Based Evidence:

Review Based Evidence:

the rating and review aggregation for find the

POST TAGGER (Part Of Speech Tagging)

programmed to make some n-gram assumption, often a trigram assumption.

Algorithm & Techniques

text classification, automatic text summarization, machine translation and even

1. Achieving Guaranteed Anonymity in GPS Traces via Uncertainty - Aware

2. Mining Mobile User Preferences for Personalized Context - Aware

building personalized context aware recommender systems. A critical challenge

3. Ranking Metric Anomaly in Invariant Networks

distributed information systems. A traditional approach to model monitoring data is

4. Enhancing Collaborative Filtering by User Interest Expansion via

approach outperforms two state-of-the-art benchmark methods with a significant

7. A Taxi Driving Fraud Detection System Paper Description:

investigate the motivation of drivers to commit a driving fraud by analyzing the

: Net Beans 6.9.1

: Pentium IV 2.4 GHz

: 110 keys enhanced

One characteristic of Java is portability, which means that computer

hardware/operating-system platform. This is achieved by compiling the Java

* Loaders-text-sql-Actions in Database Explorer

A high-level programming language developed by Sun Microsystems. Java

reusable framework for

simplifying the development of Java Swing desktop applications.

download digitally-signed upgrades

features directly into the running application.

Enterprise applications can consist of the following:

EJB modules (packaged in JAR files);