Professional Documents
Culture Documents
in
their internal
www.globalsoftsolutions.in
rankings, and add some constraints to control the margin and slack variables of RASVM adaptively. Finally, ranking adaptability measurement is proposed to
quantitatively estimate if an existing ranking model can be adapted to a new domain.
Experiments performed over Letor and two large scale data sets crawled from a
commercial search engine demonstrate the applicabilities of the proposed ranking
adaptation algorithms and the ranking adaptability measurement.
www.globalsoftsolutions.in
defined either explicitly or implicitly. In this paper, we introduce a novel
multiviewpoint-based similarity measure and two related clustering methods. The
major difference between a traditional dissimilarity/similarity measure and ours is
that the former uses only a single viewpoint, which is the origin, while the latter
utilizes many different viewpoints, which are objects assumed to not be in the same
cluster with the two objects being measured. Using multiple viewpoints, more
informative assessment of similarity could be achieved. Theoretical analysis and
empirical study are conducted to support this claim. Two criterion functions for
document clustering are proposed based on this new measure. We compare them
with several well-known clustering algorithms that use other popular similarity
measures on various document collections to verify the advantages of our proposal.
www.globalsoftsolutions.in
called topic anatomy, which summarizes and associates the core parts of a topic
temporally so that readers can understand the content easily. The proposed topic
anatomy model, called TSCAN, derives the major themes of a topic from the
eigenvectors of a temporal block association matrix. Then, the significant events of
the themes and their summaries are extracted by examining the constitution of the
eigenvectors. Finally, the extracted events are associated through their temporal
closeness and context similarity to form an evolution graph of the topic.
Experiments based on the official TDT4 corpus demonstrate that the generated
temporal summaries present the storylines of topics in a comprehensible form.
Moreover, in terms of content coverage, coherence, and consistency, the summaries
are superior to those derived by existing summarization methods based on humancomposed reference summaries
www.globalsoftsolutions.in
to either ranked keyword or pure Boolean retrieval. In particular, EBR models
produce meaningful rankings; their query model allows the representation of
complex concepts in an and-or format; and they are scrutable, in that the score
assigned to a document depends solely on the content of that document, unaffected
by any collection statistics or other external factors. These characteristics make
EBR models attractive in domains typified by medical and legal searching, where the
emphasis is on iterative development of reproducible complex queries of dozens or
even hundreds of terms. However, EBR is much more computationally expensive
than the alternatives. We consider the implementation of the p-norm approach to
EBR, and demonstrate that ideas used in the max-score and wand exact
optimization techniques for ranked keyword retrieval can be adapted to allow
selective bypass of documents via a low-cost screening process for this and similar
retrieval models. We also propose term-independent bounds that are able to
further reduce the number of score calculations for short, simple queries under the
extended Boolean retrieval model. Together, these methods yield an overall saving
from 50 to 80 percent of the evaluation cost on test queries drawn from biomedical
search.
www.globalsoftsolutions.in
www.globalsoftsolutions.in
knowledge representation method for XML documents which is based on a typed
higher order logic formalism. With this representation method, an XML document is
represented as a higher order logic term where both its contents and structures
are captured. We then present a decision-tree learning algorithm driven by
precision/recall breakeven point (PRDT) for the XML classification problem which
can produce comprehensible theories. Finally, a semi-supervised learning algorithm
is given which is based on the PRDT algorithm and the cotraining framework.
Experimental results demonstrate that our framework is able to achieve good
performance in both supervised and semi-supervised learning with the bonus of
producing comprehensible learning theories.
www.globalsoftsolutions.in
both the structure and the contents of Extensible Markup Language (XML)
documents, and can be stored in XML format as well. This mined knowledge is later
used to provide: 1) a concise idea-the gist-of both the structure and the content of
the XML document and 2) quick, approximate answers to queries. In this paper, we
focus on the second feature. A prototype system and experimental results
demonstrate the effectiveness of the approach.
www.globalsoftsolutions.in
www.globalsoftsolutions.in
and extended star-structure. Experimental studies are conducted on benchmark
document datasets to illustrate how the proposed approach can be applied flexibly
under different scenarios in real-world applications. The experimental results
demonstrate the feasibility and effectiveness of the new approach compared with
existing ones.
www.globalsoftsolutions.in
Our effective index structures and searching algorithms can achieve a very high
interactive speed. We study research challenges in this new search framework. We
propose effective index structures and top-k algorithms to achieve a high
interactive speed. We examine effective ranking functions and early termination
techniques to progressively identify the top-k relevant answers. We have
implemented our method on real data sets, and the experimental results show that
our method achieves high search efficiency and result quality.
www.globalsoftsolutions.in
different kinds of recommendations are made on the Web every day, including
movies, music, images, books recommendations, query suggestions, tags
recommendations, etc. No matter what types of data sources are used for the
recommendations, essentially these data sources can be modeled in the form of
various types of graphs. In this paper, aiming at providing a general framework on
mining Web graphs for recommendations, (1) we first propose a novel diffusion
method which propagates similarities between different nodes and generates
recommendations; (2) then we illustrate how to generalize different
recommendation problems into our graph diffusion framework. The proposed
framework can be utilized in many recommendation tasks on the World Wide Web,
including query suggestions, tag recommendations, expert finding, image
recommendations, image annotations, etc. The experimental analysis on large data
sets shows the promising future of our work.
www.globalsoftsolutions.in
that the former uses only a single viewpoint, which is the origin, while the latter
utilizes many different viewpoints, which are objects assumed to not be in the same
cluster with the two objects being measured. Using multiple viewpoints, more
informative assessment of similarity could be achieved. Theoretical analysis and
empirical study are conducted to support this claim. Two criterion functions for
document clustering are proposed based on this new measure. We compare them
with several well-known clustering algorithms that use other popular similarity
measures on various document collections to verify the advantages of our proposal.
www.globalsoftsolutions.in
Due to a wide range of potential applications, research on mobile commerce has
received a lot of interests from both of the industry and academia. Among them,
one of the active topic areas is the mining and prediction of users' mobile
commerce behaviors such as their movements and purchase transactions. In this
paper, we propose a novel framework, called Mobile Commerce Explorer (MCE), for
mining and prediction of mobile users' movements and purchase transactions under
the context of mobile commerce. The MCE framework consists of three major
components: 1) Similarity Inference Model (SIM) for measuring the similarities
among stores and items, which are two basic mobile commerce entities considered
in this paper; 2) Personal Mobile Commerce Pattern Mine (PMCP-Mine) algorithm for
efficient discovery of mobile users' Personal Mobile Commerce Patterns (PMCPs);
and 3) Mobile Commerce Behavior Predictor (MCBP) for prediction of possible
mobile user behaviors. To our best knowledge, this is the first work that facilitates
mining and prediction of mobile users' commerce behaviors in order to recommend
stores and items previously unknown to a user. We perform an extensive
experimental evaluation by simulation and show that our proposals produce
excellent results.
Privacy preservation is important for machine learning and data mining, but
measures designed to protect private information often result in a trade-off:
reduced utility of the training samples. This paper introduces a privacy preserving
approach that can be applied to decision tree learning, without concomitant loss of
accuracy. It describes an approach to the preservation of the privacy of collected
data samples in cases where information from the sample database has been
partially lost. This approach converts the original sample data sets into a group of
unreal data sets, from which the original samples cannot be reconstructed without
the entire group of unreal data sets. Meanwhile, an accurate decision tree can be
built directly from those unreal data sets. This novel approach can be applied
directly to the data storage as soon as the first sample is collected. The approach
is compatible with other privacy preserving approaches, such as cryptography, for
extra protection.
www.globalsoftsolutions.in
Brute force and dictionary attacks on password-only remote login services are now
widespread and ever increasing. Enabling convenient login for legitimate users while
preventing such attacks is a difficult problem. Automated Turing Tests (ATTs)
continue to be an effective, easy-to-deploy approach to identify automated
malicious login attempts with reasonable cost of inconvenience to users. In this
paper, we discuss the inadequacy of existing and proposed login protocols designed
to address large-scale online dictionary attacks (e.g., from a botnet of hundreds of
thousands of nodes). We propose a new Password Guessing Resistant Protocol
(PGRP), derived upon revisiting prior proposals designed to restrict such attacks.
While PGRP limits the total number of login attempts from unknown remote hosts
to as low as a single attempt per username, legitimate users in most cases (e.g.,
when attempts are made from known, frequently-used machines) can make several
failed login attempts before being challenged with an ATT. We analyze the
performance of PGRP with two real-world data sets and find it more promising than
existing proposals.
www.globalsoftsolutions.in
29.PROTECTING SENSITIVE LABELS IN SOCIAL NETWORK DATA
ANONYMIZATION
Privacy is one of the major concerns when publishing or sharing social network data
for social science research and business analysis. Recently, researchers have
developed privacy models similar to $k$-anonymity to prevent node reidentification through structure information. However, even when these privacy
models are enforced, an attacker may still be able to infer one's private
information if a group of nodes largely share the same sensitive labels (i.e.,
attributes). In other words, the label-node relationship is not well protected by
pure structure anonymization methods. Furthermore, existing approaches,
which rely on edge editing or node clustering and merging, may significantly
alter key graph properties. In this paper, we define a $k$-degree-$l$diversity anonymity model that considers the protection of structural
information as well as sensitive labels of individuals. We further propose a
novel anonymization methodology based on adding noise nodes. We develop
several algorithms to add noise nodes into the original graph with the
consideration of introducing the least distortion to graph properties. Most
importantly, we provide a rigorous analysis of the theoretical upper bound on
the number of noise nodes added and their impacts on important graph
properties. We conduct extensive experiments to evaluate the effectiveness of
the proposed technique.
www.globalsoftsolutions.in
www.globalsoftsolutions.in
This study of collective behavior is to understand how individuals behave in a social
networking environment. Oceans of data generated by social media like Facebook,
Twitter, Flickr, and YouTube present opportunities and challenges to study
collective behavior on a large scale. In this work, we aim to learn to predict
collective behavior in social media. In particular, given information about some
individuals, how can we infer the behavior of unobserved individuals in the same
network? A social-dimension-based approach has been shown effective in
addressing the heterogeneity of connections presented in social media. However,
the networks in social media are normally of colossal size, involving hundreds of
thousands of actors. The scale of these networks entails scalable learning of
models for collective behavior prediction. To address the scalability issue, we
propose an edge-centric clustering scheme to extract sparse social dimensions.
With sparse social dimensions, the proposed approach can efficiently handle
networks of millions of actors while demonstrating a comparable prediction
performance to other nonscalable methods
www.globalsoftsolutions.in
scope of pointcuts. In this paper, we present an automated approach that limits
fragility problems by providing mechanical assistance in pointcut maintenance. The
approach is based on harnessing arbitrarily deep structural commonalities between
program elements corresponding to join points selected by a pointcut. The
extracted patterns are then applied to later versions to offer suggestions of new
join points that may require inclusion. To illustrate that the motivation behind our
proposal is well founded, we first empirically establish that join points captured by
a single pointcut typically portray a significant amount of unique structural
commonality by analyzing patterns extracted from 23 AspectJ programs. Then, we
demonstrate the usefulness of our technique by rejuvenating pointcuts in multiple
versions of three of these programs. The results show that our parameterized
heuristic algorithm was able to accurately and automatically infer the majority of
new join points in subsequent software versions that were not captured by the
original pointcuts