You are on page 1of 13

ISAS

REPORT

ISAS CASE 3: LOG MARKUP LANGUAGE SAMPLE


ISAS REPORT
HaNoiCTT

ISAS CASE 3:
LOG MARKUP LANGUAGE SAMPLE

Developed by

Name: Nguyen Đình Long


Registration No.: R07300490045

Page 2 of 13
ISAS REPORT
HaNoiCTT

LOG MARKUP LANGUAGE SAMPLE

Batch Code: B080059


Start Date: 20/6/2009
End Date: 7/7/2009
Name of the Coordinator : Nguyen Thanh Trung
Name of Developer: Nguyen Dinh Long
Date of Submission:

Page 3 of 13
ISAS REPORT
HaNoiCTT

CERTIFICATE

This is to certify that this report titled Log Markup Language


Sample embodies the original work done by in partial fulfillment
of their course requirement at NIIT.

Coordinator:

Page 4 of 13
ISAS REPORT
HaNoiCTT

INTRODUCTION

Log Markup Language (LOGML) is an XML 1.0 application [XML-


1.0] designed to describe log reports of web servers. Web-data mining
is one of the current hot topics in computer science. Mining data that
has been collected from web server logfiles, is not only useful for
studying customer choices, but also helps in organizing web pages.
This is accomplished by knowing which web pages are most
frequently accessed by the web surfers. The structure of a web site is
represented as a web graph (see the XGMML draft specification). In
mining the data from the log statistics, we use the web graph in
annotating the log information. Further we give summary reports,
comprising of information such as client sites, types of browsers and
the usage time statistics. We also gather the client activity in a web
site as a subgraph of the web site graph. This subgraph can be used to
get better understanding of general user activity in the web site.
In LOGML, we create a new XML vocabulary to structurally express
the contents of the logfile information.

Page 5 of 13
ISAS REPORT
HaNoiCTT

INTRODUCTION

XGMML is an XML application to describe graphs. A web site


can be described as a graph where the web pages are the nodes and
the hyperlinks are the edges. User visits of web pages and/or user
traversals of hyperlinks can be represented as a web graph where the
nodes and edges contain the number of visits by users. User sessions
are the subgraphs of the web graph where the time of visit is also
being saved. A LOGML file contains the web graph whose nodes are
the web pages that have been visited at least once and whose edges
are hyperlinks that have been traversed at least once. We call this
graph a "Log Graph". The rest of the LOGML file is the report of the
additional information, such as top hosts, top browsers, top keywords,
etc. LOGML uses XGMML to describe the log graphs and adds
additional attributes to the nodes and edges to save information such
as number of hits.

Page 6 of 13
ISAS REPORT
HaNoiCTT

ACTIVITIES LIST

A typical LOGML Document has three sections. The first section


is a graph that describes the log graph of the visits of the users to
web pages and hyperlinks. This section uses XGMML to describe
the graph and the root element is the graph element. The second
section is the additional information of log reports such as top
visiting hosts, top user agents, top keywords, etc. The third
section is the report of the user sessions. Each user session is a
subgraph of the log graph. The subgraphs are reported as a list of
edges that referer to the nodes of the log graph. Each edge of the
user sessions also has a timestamp when the edge was traversed.
This timestamp helps to compute the total time of the user
session.

Page 7 of 13
ISAS REPORT
HaNoiCTT

LOGML SAMPLE

Page 8 of 13
ISAS REPORT
HaNoiCTT

STRURCTURE OF LOGML

Page 9 of 13
ISAS REPORT
HaNoiCTT

LOG GRAPH OF RPI INFO WEBSITE

Page 10 of 13
ISAS REPORT
HaNoiCTT

LOG GRAPH OF RPI NEWS MAGAZINE WEBSITE

Page 11 of 13
ISAS REPORT
HaNoiCTT

PROJECT FILE DETAILS

Page 12 of 13
ISAS REPORT
HaNoiCTT

Page 13 of 13