You are on page 1of 2

1) A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites.

In this paper, we present a complete framework and findings in mining Web usage patterns from Web log files of a real Web site that has all the challenging aspects of real-life Web usage mining, including evolving user profiles and external data describing an ontology of the Web content. Even though the Web site under study is part of a nonprofit organization that does not "sell" any products, it was crucial to understand "who" the users were, "what" they looked at, and "how their interests changed with time," all of which are important questions in Customer Relationship Management (CRM). Hence, we present an approach for discovering and tracking evolving user profiles. We also describe how the discovered user profiles can be enriched with explicit information need that is inferred from search queries extracted from Web log data. Profiles are also enriched with other domain-specific information facets that give a panoramic view of the discovered mass usage modes. An objective validation strategy is also used to assess the quality of the mined profiles, in particular their adaptability in the face of evolving user behavior.

2) A Similarity Metric for Retrieval of Compressed Objects: Application for Mining Satellite Image Time Series.

This paper addresses the problem of building an index of compressed object databases. We introduce an informational similarity measure based on the coding length of two part codes. Then, we present a methodology for compressing the database by taking into account interobject redundancies and by using the informational similarity measure. The method produces an index included in the code of the data volume. This index is built such that it contains the minimal sufficient information to discriminate the data-volume objects. Then, we present an

optimal two-part coder for compressing spatio-temporal events contained in satellite image time series (SITS). The two-part coder allows us to measure similarity and then to derive an optimal index of SITS spatio-temporal events. The resulting index is representative of the SITS information content and enables queries based on information content.

3) A Fast Algorithm for Learning a Ranking Function from Large-Scale Data Sets.

We consider the problem of learning a ranking function that maximizes a generalization of the Wilcoxon-Mann-Whitney statistic on the training data. Relying on an e-accurate approximation for the error function, we reduce the computational complexity of each iteration of a conjugate gradient algorithm for learning ranking functions from O(m2) to O(m), where m is the number of training samples. Experiments on public benchmarks for ordinal regression and collaborative filtering indicate that the proposed algorithm is as accurate as the best available methods in terms of ranking accuracy, when the algorithms are trained on the same data. However, since it is several orders of magnitude faster than the current state-of-the-art approaches, it is able to leverage much larger training data sets.

You might also like