You are on page 1of 58

CHAPTER NO

TITLE

PAGE NO

ABSTRACT

iii

LIST OF TABLES

viii

LIST OF FIGURES

LIST OF ABBREVIATIONS

1.

INTRODUCTION
1.1

2.

xii

PROJECT DESCRIPTION

SYSTEM STUDY

11

2.1 FEASABILITY STUDY


2.2 EXISTING SYSTEM
2.3 PROPOSED SYSTEM
3

SYSTEM SPECIFICATION

14

3.1 SOFTWARE REQUIREMENTS


3.2 HARDWARE REQUIREMENTS
4

LANGUAGE SPECIFICATION

15

4.1 FEATURES OF .NET


4.2 FEATURES OF C#.NET
4.2 FEATURES OF SQL SERVER 2005
5

SYSTEM DESIGN

32

5.1 INPUT DESIGN


5.2 OUTPUT DESIGN
5.3 DATABASE DESIGN
5.4 DATA FLOW DIAGRAM
5.5 SYSTEM FLOW DIAGRAM

SYSTEM TESTING AND MAINTENANCE

58

6.1 UNIT TESTING


6.2 INTEGRATION TESTING
6.3 VALIDATION
7

SYSTEM IMPLEMENTATION

59

7.1 SCOPE FOR FUTURE DEVELOPMENT


8

CONCLUSION

60

BIBLIOGRAPHY
APPENDIX

SCREEN SHOT

DATA TABLE STRUCTURE

SAMPLE CODING
I.

62

LIST OF FIGURES

FIGURE NO

NAME

.NET FRAMEWORK

INTEROPERABILITY

WEB CONTROLS

PAGE NO

10

12

Mining website for analyzing users interests


Introduction:Continuous growth in the size of WWW.E-business & e-commerce
sectors are rapidly evolving,The need for Web Marketplaces Requirement
for predicting user needs in order to improve the usability provide users
with the information they want or need, without expecting from them to ask
for it explicitly Taking advantage of the knowledge gained from the users
navigational

behaviour

and

individual

interests

.Customization

vs

personalization, the categorization and pre-processing of Web data the


extraction of correlations between and across different kinds of such data the
determination of the actions that should be recommended by such a
personalization system Web data are those that can be collected and used in
the context of Web personalization. These data are classified in four
categories.
o Content data are presented to the end-user appropriately structured.
They can be simple text, images, or structured data, such as
information retrieved from databases.
o Structure data represent the way content is organized. They can be
either data entities used within a Web page, such as HTML or XML
tags, or data entities used to put a Web site together, such as
hyperlinks connecting one page to another.
o Usage data represent a Web sites usage, such as a visitors IP address,
time and date of access, complete path (files or directories) accessed,
referrers address, and other attributes that can be included in a Web
access log.

o User profile data provide information about the users of a Web site. A
user profile contains demographic information (such as name, age,
country, marital status, education, interests etc.) for each user of a
Web site, as well as information about users interests and preferences.

Scope of the Projects:As the size of web increases along with number of users, it is very much
essential for the website owners to better understand their customers so that they
can provide better service, and also enhance the quality of the website. To achieve
this they depend on the web access log files. The web access log files can be mined
to extract interesting pattern so that the user behavior can be understood. Access of
web pages according to period of time, e.g. daily, monthly, yearly is registered.
This project presents an overview of web usage mining and also provides a survey
of the pattern extraction algorithms used for web usage mining. It is a
comprehensive access log analysis tool. It allows you to keep track of activity on
your site by month, week, day and hour, to monitor total hits, total visitors, and
total successful and page views, and to keep track of your most popular pages. This
work will have the actors Administrator and User with Admin can extracting
interesting patterns from the pre processed web logs. He can get general statistics
like number of hits, no of visitors. User can login and view the desire information.
User can search what they need.

2. SYSTEM STUDY

2.1 FEASIBILITY STUDY


The feasibility of the project is analyzed in this phase and business proposal
is put forth with a very general plan for the project and some cost estimates.
During system analysis the feasibility study of the proposed system is to be carried
out. This is to ensure that the proposed system is not a burden to the company. For
feasibility analysis, some understanding of the major requirements for the system
is essential.
Three key considerations involved in the feasibility analysis are
ECONOMICAL FEASIBILITY
TECHNICAL FEASIBILITY
SOCIAL FEASIBILITY

ECONOMICAL FEASIBILITY
This study is carried out to check the economic impact that the system
will have on the organization. The amount of fund that the company can pour into
the research and development of the system is limited. The expenditures must be
justified. Thus the developed system as well within the budget and this was
achieved because most of the technologies used are freely available. Only the
customized products had to be purchased.
TECHNICAL FEASIBILITY
This study is carried out to check the technical feasibility, that is, the technical
requirements of the system. Any system developed must not have a high demand
on the available technical resources. This will lead to high demands on the

available technical resources. This will lead to high demands being placed on the
client. The developed system must have a modest requirement, as only minimal or
null changes are required for implementing this system.
SOCIAL FEASIBILITY
The aspect of study is to check the level of acceptance of the system by the
user. This includes the process of training the user to use the system efficiently.
The user must not feel threatened by the system, instead must accept it as a
necessity. The level of acceptance by the users solely depends on the methods that
are employed to educate the user about the system and to make him familiar with
it. His level of confidence must be raised so that he is also able to make some
constructive criticism, which is welcomed, as he is the final user of the system.
Existing System:
A good and effective Customer Relationship Management (CRM) needs
clear understandings of customer requirements. The management should
have up to date needs of the customers, and accordingly they should act.
Gathering information about the web user (customer) profiles and analysis
of the profiles is not so easy for the large data. For example, Nokia site is
developed to search exact cell phone models based on the customer interests.
If they analyze all the user profiles they can know the most wanted cell
phone model. Analyzing web user profiles and making decisions and
predictions are difficult. The accessed data may not be stored at the Server
side [1]. So many Web Sites provides web user interface to their
Customers/Users, but they may not collect all the accessed information,
because they feel that is not so important.
Proposed System:

The Proposed System provides collecting and mining of the collected data,
that which can improve the companys growth and mining techniques also
provides the quality of the User-Interface. The Proposed Model constitutes
the following activities. Collecting all the Web accessed data on a web
server preparing the collected information as a data set. [Pre pruning
Process]. Creating per-user profiles, creating decisions based on the user
profiles. Showing the decisions to the company in the Website. Collection of
Web data such as activities/Click Streams recorded in Web Server logs
Preprocessing of Web data such as filtering request and identifying unique
regions. Interpretation/Evaluation of the discovered profiles.

Software Requirements: Operating system

:- Windows7

Front End

:- Microsoft Visual Studio .Net 2010

Coding Language

:- C#

Backend

:- SQL Server 2005

Hardware Requirements:

System

Hard disk
Mouse

: Pentium IV 2.4 GHZ


: 40 GB
: Logitech

RAM

: 2GB(minimum)

Keyboard

: 110 keys enhanced

4. LANGUAGE SPECIFICATION
4.1 FEATURES OF. NET:-

Microsoft .NET is a set of Microsoft software technologies for rapidly


building and integrating XML Web services, Microsoft Windows-based
applications, and Web solutions. The .NET Framework is a language-neutral
platform for writing programs that can easily and securely interoperate. Theres no
language barrier with .NET: there are numerous languages available to the
developer including Managed C++, C#, Visual Basic and Java Script. The .NET
framework provides the foundation for components to interact seamlessly, whether

locally or remotely on different platforms. It standardizes common data types and


communications protocols so that components created in different languages can
easily interoperate.
.NET is also the collective name given to various software
components built upon the .NET platform. These will be both products (Visual
Studio.NET and Windows.NET Server, for instance) and services (like Passport,
.NET My Services, and so on).
THE .NET FRAMEWORK
The .NET Framework has two main parts:
1. The Common Language Runtime (CLR).
2. A hierarchical set of class libraries.
The CLR is described as the execution engine of .NET. It provides the
environment within which programs run. The most important features are
Conversion from a low-level assembler-style language, called
Intermediate Language (IL), into code native to the platform being
executed on.
Memory management, notably including garbage collection.
Checking and enforcing security restrictions on the running code.
Loading and executing programs, with version control and other such
features.
The following features of the .NET framework are also worth
description:

MANAGED CODE
The code that targets .NET, and which contains certain extra
Information - metadata - to describe itself. Whilst both managed and unmanaged
code can run in the runtime, only managed code contains the information that
allows the CLR to guarantee, for instance, safe execution and interoperability.
MANAGED DATA
With Managed Code comes Managed Data. CLR provides memory
allocation and Deal location facilities, and garbage collection. Some .NET
languages use Managed Data by default, such as C#, Visual Basic.NET and
JScript.NET, whereas others, namely C++, do not. Targeting CLR can, depending
on the language youre using, impose certain constraints on the features available.
As with managed and unmanaged code, one can have both managed and
unmanaged data in .NET applications - data that doesnt get garbage collected but
instead is looked after by unmanaged code.
COMMON TYPE SYSTEM
The CLR uses something called the Common Type System (CTS) to strictly
enforce type-safety. This ensures that all classes are compatible with each other, by
describing types in a common way. CTS define how types work within the
runtime, which enables types in one language to interoperate with types in another
language, including cross-language exception handling. As well as ensuring that
types are only used in appropriate ways, the runtime also ensures that code doesnt
attempt to access memory that hasnt been allocated to it.

COMMON LANGUAGE SPECIFICATION


The CLR provides built-in support for language interoperability. To ensure
that you can develop managed code that can be fully used by developers using any
programming language, a set of language features and rules for using them called
the Common Language Specification (CLS) has been defined. Components that
follow these rules and expose only CLS features are considered CLS-compliant.
THE CLASS LIBRARY
.NET provides a single-rooted hierarchy of classes, containing over 7000
types. The root of the namespace is called System; this contains basic types like
Byte, Double, Boolean, and String, as well as Object. All objects derive from
System. Object. As well as objects, there are value types. Value types can be
allocated on the stack, which can provide useful flexibility. There are also efficient
means of converting value types to object types if and when necessary.
The set of classes is pretty comprehensive, providing collections, file,
screen, and network I/O, threading, and so on, as well as XML and database
connectivity.
The class library is subdivided into a number of sets (or namespaces), each
providing distinct areas of functionality, with dependencies between the
namespaces kept to a minimum.
LANGUAGES SUPPORTED BY .NET
The multi-language capability of the .NET Framework and Visual
Studio .NET enables developers to use their existing programming skills to build
all types of applications and XML Web services. The .NET framework supports

new versions of Microsofts old favorites Visual Basic and C++ (as VB.NET and
Managed C++), but there are also a number of new additions to the family.
Visual Basic .NET has been updated to include many new and
improved language features that make it a powerful object-oriented programming
language. These features include inheritance, interfaces, and overloading, among
others. Visual Basic also now supports structured exception handling, custom
attributes and also supports multi-threading.
Visual Basic .NET is also CLS compliant, which means that any CLScompliant language can use the classes, objects, and components you create in
Visual Basic .NET.
Managed Extensions for C++ and attributed programming are just
some of the enhancements made to the C++ language. Managed Extensions
simplify the task of migrating existing C++ applications to the new .NET
Framework.
C# is Microsofts new language. Its a C-style language that is
essentially C++ for Rapid Application Development. Unlike other languages, its
specification is just the grammar of the language. It has no standard library of its
own, and instead has been designed with the intention of using the .NET libraries
as its own.
Microsoft Visual J# .NET provides the easiest transition for Javalanguage developers into the world of XML Web Services and dramatically
improves the interoperability of Java-language programs with existing software
written in a variety of other programming languages.

Active State has created Visual Perl and Visual Python, which
enable .NET-aware applications to be built in either Perl or Python. Both products
can be integrated into the Visual Studio .NET environment. Visual Perl includes
support for Active States Perl Dev Kit.
Other languages for which .NET compilers are available include

FORTRAN

COBOL

Eiffel

Fig1 .Net Framework


ASP.NET
Windows Forms
XML WEB SERVICES
Base Class Libraries
Common Language Runtime
Operating System

4.2 FEATURES OF C#. NET


C#.NET is also compliant with CLS (Common Language
Specification) and supports structured exception handling. CLS is set of rules
and constructs that are supported by the CLR (Common Language Runtime).
CLR is the runtime environment provided by the .NET Framework; it manages
the execution of the code and also makes the development process easier by
providing services.
C#.NET is a CLS-compliant language. Any objects, classes, or components that
created in C#.NET can be used in any other CLS-compliant language. In
addition, we can use objects, classes, and components created in other CLScompliant languages in C#.NET .The use of CLS ensures complete
interoperability among applications, regardless of the languages used to create
the application.
CONSTRUCTORS AND DESTRUCTORS:
Constructors are used to initialize objects, whereas destructors are
used to destroy them. In other words, destructors are used to release the
resources allocated to the object. In C#.NET the sub finalize procedure is
available. The sub finalize procedure is used to complete the tasks that must be
performed when an object is destroyed. The sub finalize procedure is called
automatically when an object is destroyed. In addition, the sub finalize
procedure can be called only from the class it belongs to or from derived
classes.

GARBAGE COLLECTION
Garbage Collection is another new feature in C#.NET. The .NET
Framework monitors allocated resources, such as objects and variables. In
addition, the .NET Framework automatically releases memory for reuse by
destroying objects that are no longer in use.
In C#.NET, the garbage collector checks for the objects that are not currently in
use by applications. When the garbage collector comes across an object that is
marked for garbage collection, it releases the memory occupied by the object.
OVERLOADING
Overloading is another feature in C#. Overloading enables us to define
multiple procedures with the same name, where each procedure has a different
set of arguments. Besides using overloading for procedures, we can use it for
constructors and properties in a class.

MULTITHREADING:
C#.NET also supports multithreading. An application that supports
multithreading can handle multiple tasks simultaneously, we can use
multithreading to decrease the time taken by an application to respond to user
interaction.
STRUCTURED EXCEPTION HANDLING
C#.NET supports structured handling, which enables us to detect and
remove errors at runtime. In C#.NET, we need to use TryCatchFinally
statements to create exception handlers. Using TryCatchFinally statements,

we can create robust and effective exception handlers to improve the


performance of our application.
THE .NET FRAMEWORK
The .NET Framework is a new computing platform that simplifies
application development in the highly distributed environment of the Internet.
OBJECTIVES OF. NET FRAMEWORK

1. To provide a consistent object-oriented programming environment whether


object codes is stored and executed locally on Internet-distributed, or executed
remotely.
2. To provide a code-execution environment to minimizes software deployment
and guarantees safe execution of code.
3. Eliminates the performance problems.
There are different types of application, such as Windows-based applications
and Web-based applications.
4.3 FEATURES OF SQL-SERVER
The OLAP Services feature available in SQL Server version 7.0 is now
called SQL Server 2000 Analysis Services. The term OLAP Services has been
replaced with the term Analysis Services. Analysis Services also includes a new
data mining component. The Repository component available in SQL Server
version 7.0 is now called Microsoft SQL Server 2000 Meta Data Services.
References to the component now use the term Meta Data Services. The term
repository is used only in reference to the repository engine within Meta Data
Services

SQL-SERVER database consist of six type of objects,


They are,
1. TABLE
2. QUERY
3. FORM
4. REPORT
5. MACRO

TABLE:
A database is a collection of data about a specific topic.
VIEWS OF TABLE:
We can work with a table in two types,
1. Design View
2. Datasheet View
DESIGN VIEW
To build or modify the structure of a table we work in the table design view.
We can specify what kind of data will be hold.

DATASHEET VIEW
To add, edit or analyses the data itself we work in tables datasheet view
mode.
QUERY:
A query is a question that has to be asked the data. Access gathers data that
answers the question from one or more table. The data that make up the answer is
either dynaset (if you edit it) or a snapshot(it cannot be edited).Each time we run
query, we get latest information in the dynaset. Access either displays the dynaset
or snapshot for us to view or perform an action on it ,such as deleting or updating.
Literature survey:Information Extraction:Information Extraction (IE) is the name given to any process which
selectively structures and combines data which is found, explicitly stated or
implied, in one or more texts. The final output of the extraction process varies; in
every case, however, it can be transformed so as to populate some type of database.
Information analysts working long term on specific tasks already carry out
information extraction manually with the express goal of database creation.
One reason for interest in IE is its role in evaluating, and comparing,
different Natural Language Processing technologies. Unlike other NLP
technologies, MT for example, the evaluation process is concrete and can be
performed automatically. This, plus the fact that a successful extraction system has
immediate applications, has encouraged research funders to support both
evaluations of and research into IE. It seems at the moment that this funding will
continue and will bring about the existence of working systems. Applications of IE

are still scarce. A few well known examples exist and other classified systems may
also be in operation. It is certainly not true that the level of the technology is such
that it is easy to build systems for new tasks, or that the levels of performance are
sufficiently high for use in fully automatic systems. The effect on long term
research on NLP is debatable and this is considered in the final section which
speculates on future directions in IE. We begin our examination of IE by
considering a specific example from the Fourth Message Understanding
Conference (MUC-4 DARPA 92) evaluation. An examination of the prognosis for
this relatively new, and as yet unproven, language technology follows together
with a brief history of how IE has evolved is given. The related problems of
evaluation methodology and task definition are examined. The current methods
used for building IE extraction systems are outlined. The term IE can be applied to
a range of tasks, and we consider three generic applications.
Visual Web Information Extraction with Lixto*:We present new techniques for supervised wrapper generation and
automated web information extraction, and a system called Lixto implementing
these techniques. Our system can generate wrappers which translate relevant pieces
of HTML pages into XML. Lixto, of which a working prototype has been
implemented, assists the user to semi-automatically create wrapper programs by
providing a fully visual and interactive user interface. In this convenient userinterface very expressive extraction programs can be created. Internally, this
functionality is reected by the new logicbased declarative language Elog. Users
never have to deal with Elog and even familiarity with HTML is not required.
Lixto can be used to create an \XML-Companion" for an HTML web page with

changing content, containing the continually updated XML translation of the


relevant information. Nowadays web content is mainly formatted in HTML.
This is not expected to change soon, even if more exible languages such as
XML are attracting a lot of attention. While both HTML and XML are languages
for representing semistructured data, the _rst is mainly presentation-oriented and is
not really suited for database applications. XML, on the other hand, separates data
structure from layout and provides a much more suitable data representation (cf.
e.g. [1, 16]). A set of XML documents can be regarded as a database and can be
directly processed by a database application or queried via one of the new query
languages for XML, such as XML-GL , XML-QL and X Query . As the following
example shows, the lack of accessibility of HTML data for querying has dramatic
consequences on the time and cost spent to retrieve relevant information from web
pages.
Extracting Structured Data from Web Pages:The World Wide Web is a vast and rapidly growing source of information.
Most of this information is in the form of unstructured text, making the
information hard to query. There are, however, many web sites that have large
collections of pages containing structured data, i.e., data having a structure or a
schema. These pages are typically generated dynamically from an underlying
structured source like a relational database. An example of such a collection is the
set of book pages in Amazon [2] (Figure 1). The data in each book page has the
same schema, i.e., each page contains the title, list of authors, price of a book and
so on. This paper studies the problem of automatically extracting structured data
encoded in a given collection of pages, without any human input like manually
generated rules or training sets. For instance, from a collection of pages like those

in Figure 1 we would like to extract book tuples, where each tuple consists of the
title, the set of authors, the (optional) list-price, and other attributes .
Many web sites contain large sets of pages generated using a common
template or layout. For example, Amazon lays out the author, title, comments, etc.
in the same way in all its book pages. The values used to generate the pages (e.g.,
the author, title,...) typically come from a database. In this paper, we study the
problem of automatically extracting the database values from such template
generated web pages without any learning examples or other similar human input.
We formally define a template, and propose a model that describes how values are
encoded into pages using a template. We present an algorithm that takes, as input, a
set of template-generated pages, deduces the unknown template used to generate
the pages, and extracts, as output, the values encoded in the pages. Experimental
evaluation on a large number of real input page collections indicates that our
algorithm correctly extracts data in most cases.
Web Object Retrieval:The primary function of current Web search engines is essentially relevance
ranking at the document level. However, myriad structured information about realworld objects is embedded in static Web pages and online Web databases.
Document-level information retrieval can unfortunately lead to highly inaccurate
relevance ranking in answering object-oriented queries. In this paper, we propose a
paradigm shift to enable searching at the object level. In traditional information
retrieval models, documents are taken as the retrieval units and the content of a
document is considered reliable. However, this reliability assumption is no longer
valid in the object retrieval context when multiple copies of information about the
same object typically exist. These copies may be inconsistent because of diversity
of Web site qualities and the limited performance of current information extraction

techniques. If we simply combine the noisy and inaccurate attribute information


extracted from different sources, we may not be able to achieve satisfactory
retrieval performance. In this paper, we propose several language models
for Web object retrieval, namely an unstructured object retrieval model, a
structured object retrieval model, and a hybrid model with both structured and
unstructured retrieval features. We test these models on a paper search engine and
compare their performances. We conclude that the hybrid model is the superior by
taking into account the extraction errors at varying levels.
Modules:

Authentication
Extracting Structured Data from Web Pages
Information Sources from webpage HTML code extraction
Wrapper process HTML element to XML element from webpage
Natural Language Processing extraction from webpage

User

Webpage

Information Extraction

HTML Code Transform


XML

Information Retrieval
from Web

NLP Processing

Modules Description: Authentication


This module is used to secure our application from the unauthorized persons so it wants
to ask the user to submit those details into our database so only valid users can login into
the application.

Extracting Structured Data from Web Pages

The information extracting structured data from world wide web (www). The World
Wide Web contains huge amounts of data. However, we cannot benefit very much from
the large amount of raw WebPages unless the information within them is extracted
accurately and organized well. Therefore, information extraction (IE) plays an important
role in Web knowledge discovery and management.

Information Sources from webpage HTML code extraction


We present new techniques for supervised wrapper generation and automated web
information extraction, and a system called Lixto implementing these techniques. Our
system can generate wrappers which translate relevant pieces of HTML pages into XML.
In this convenient user-interface very expressive extraction programs can be created. web
content is mainly formatted in HTML.
Wrapper process HTML element to XML element from webpage

The solution is thus to use wrapper technology to extract the relevant information from
HTML documents and translate it into XML which can be easily queried or further
processed. Based on a new method of identifying and extracting relevant parts of HTML
documents and translating them to XML format, we designed and implemented the
efficient wrapper generation which is particularly well-suited for building HTML/XML
wrappers and introduces new ideas and programming language concepts for wrapper

generation. Once a wrapper is built, it can be applied automatically to continually extract

relevant information from a permanently changing web page.


This is not expected to change soon, even if more flexible languages such as XML are
attracting a lot of attention. While both HTML and XML are languages for representing

semi structured for database Application.


Natural Language Processing extraction from webpage:

Natural

Language

processing (NLP)

is

field

of computer

science and linguistics concerned with the interactions between computers and
human (natural) languages.

In theory, natural-language processing is a very attractive method of humancomputer interaction. Natural-language understanding is sometimes referred to as
an AI-complete problem, because natural-language recognition seems to require
extensive knowledge about the outside world and the ability to manipulate it.

NLP has significant overlap with the field of computational linguistics, and is
often considered a sub-field of artificial intelligence.

Technique used or Algorithm:We can use the HCRF algorithm is based on the VIPS approach. HCRF organizes the
segmentation of the page hierarchically to form a tree structure and conducts inference on the
vision tree to tag each vision node (vision block) with a label. The first algorithm is the original
HCRF and extended Semi-CRF framework. We name it the Basic HCRF and extended SemiCRF (BHS) algorithm.

System Design:Use case Diagram:-

webpage

Information extraction
NewClass
HTML Transform XML

label extraction

Information Retrieval from Web

Class Diagram:webpage
Information Extraction
HTML Transform XML
Labelling extraction

user

Information Retrieval from Web

Sequence Diagram:-

user

webpage

information
extraction

HTML transform
XML

labelling
extraction

Information
Retrieval from Web

create user
information extraction from webpage
Transform HTML into XML
labelling extraction from xml
Information Retrieval from Web

Collaborative Diagram:-

user 1: create user

2: information extraction from webpage


information
extraction

webpage

3: Transform HTML into XML


HTML transform
XML

Information Retrieval
from Web

5: Information Retrieval from Web

4: labelling extraction from xml


labelling
extraction

State Diagram:-

user

webpage

Information
Extraction

HTML Transform
XML

labelling
Extraction

Retriaval information
from webpage

Activity Diagram:-

us er

Webpage

Inform ation
Extraction
HTML
transform XML

Labelling
Extraction

Retrieval information
from webpage

Component Diagram:-

webpage

user

information
extraction

HTML
transform XML

labelling
extraction

Retrieval information
from webpage

Object Diagram:-

User

Webpage

Retrieval
information
from
webpage

Information
extraction

Labeling
extraction
HTML Transform
XML

System Architecture:-

User

Webpage
Structure

Authentication
storage

Information
Extraction

HTML
Transform
XML

Labeling
extraction

Retrieval information
from webpage

E-R Diagram:-

User

Webpa
ge
structu
re

Authentication
storage

Information
extraction

Html
transforms
xml

Label
extraction

Retrieval
information
from webpage

Project flow Diagram:-

User

Retrieval information
from webpage
Webpage

Labeling
extraction

Information
extraction

Html Transform
Xml

Data flow Diagram:-

User

Webpage
structure

Informati
on
extractio
n

Html transforms
xml

Label
extraction

Retrieval
information from
webpage

6. SYSTEM MAINTAINANCE AND TESTING:


Testing is vital to the success of the system. System testing makes a logical
assumption that if all parts of the system are correct, the goal will be successfully
achieved. In the testing process we test the actual system in an organization and
gather errors from the new system operates in full efficiency as stated. System
testing is the stage of implementation, which is aimed to ensuring that the system
works accurately and efficiently.
In the testing process we test the actual system in an organization and gather
errors from the new system and take initiatives to correct the same. All the frontend and back-end connectivity are tested to be sure that the new system operates in
full efficiency as stated. System testing is the stage of implementation, which is
aimed at ensuring that the system works accurately and efficiently.
The main objective of testing is to uncover errors from the system. For the
uncovering process we have to give proper input data to the system. So we should
have more conscious to give input data. It is important to give correct inputs to
efficient testing.
Testing is done for each module. After testing all the modules, the modules are
integrated and testing of the final system is done with the test data, specially
designed to show that the system will operate successfully in all its aspects
conditions. Thus the system testing is a confirmation that all is correct and an
opportunity to show the user that the system works. Inadequate testing or nontesting leads to errors that may appear few months later.
The testing process focuses on logical intervals of the software ensuring that all
the statements have been tested and on the function intervals (i.e.,) conducting tests

to uncover errors and ensure that defined inputs will produce actual results that
agree with the required results. Testing has to be done using the two common steps
Unit testing and Integration testing. In the project system testing is made as
follows:
The procedure level testing is made first. By giving improper inputs, the
errors occurred are noted and eliminated. This is the final step in system life cycle.
Here we implement the tested error-free system into real-life environment and
make necessary changes, which runs in an online fashion. Here system
maintenance is done every months or year based on company policies, and is
checked for errors like runtime errors, long run errors and other maintenances like
table verification and reports.
6.1. UNIT TESTING
Unit testing verification efforts on the smallest unit of software design,
module. This is known as Module Testing. The modules are tested separately.
This testing is carried out during programming stage itself. In these testing steps,
each module is found to be working satisfactorily as regard to the expected output
from the module.
6.2. INTEGRATION TESTING
Integration testing is a systematic technique for constructing tests to uncover
error associated within the interface. In the project, all the modules are combined
and then the entire programmer is tested as a whole. In the integration-testing step,
all the error uncovered is corrected for the next testing steps.

7. SYSTEM IMPLEMENTATION

Implementation is the stage of the project when the theoretical design is turned
out into a working system. Thus it can be considered to be the most critical stage in
achieving a successful new system and in giving the user, confidence that the new
system will work and be effective.
The implementation stage careful planning, investigation of the existing system
and its constraints on implementation, designing of methods to achieve
changeover and evaluation of changeover methods.
Implementation is the process of converting a new system design into
operation. It is the phase that focuses on user training, site preparation and file
conversion for installing a candidate system. The important factor that should be
considered here is that the conversion should not disrupt the functioning of the
organization.
Application: Used to understanding the system considers the information extraction
from the web page structure.
The systematic data extraction is useful to collect the information for
understand the webpage.
Conclusion:This application has attempted to provide an up-to-date survey of the rapidly
growing area of Web Usage mining. With the growth of Web-based applications,
specifically electronic commerce, there is significant interest in analyzing Web
usage data to better understand Web usage, and apply the knowledge to better serve
users. This has led to a number of commercial offerings for doing such analysis.
However, Web Usage mining raises some hard scientific questions that must be
answered before robust tools can be developed. For Web usage mining, the session

dissimilarity measure is not a distance metric, and dealing with relational data is
impractical given the huge size of the data sets. Therefore, evolutionary techniques
which can deal with ill-defined features and non-differentiable similarity measures
are suitable. Evolutionary techniques can handle a vast array of subjective, even
non-metric dissimilarities, making them suitable for many applications in data and
Web mining. Moreover, they are meaningful only within well defined distinct
profiles/contexts (context-sensitive) as opposed to all or none of the data (contextblind). Todays web sites are a source of an exploding amount of click stream data
that can put the scalability of any data mining technique into question. Moreover,
the Web access patterns on a web site are very dynamic, due not only to the
dynamics of Web site content and structure, but also to changes in the users
interests, and thus their navigation patterns. The access patterns can be observed to
change depending on the time of day, day of week, and according to seasonal
patterns or other events in the world.
Reference or Bibliography: [1] J. Cowie and W. Lehnert, Information Extraction, Comm. ACM, vol.
39, no. 1, pp. 80-91, 1996.
[2] C. Cardie, Empirical Methods in Information Extraction, AI Magazine,
vol. 18, no. 4, pp. 65-80, 1997.
[3] R. Baumgartner, S. Flesca, and G. Gottlob, Visual Web Information
Extraction with Lixto, Proc. Conf. Very Large Data Bases (VLDB), pp.
119-128, 2001.
[4] A. Arasu and H. Garcia-Molina, Extracting Structured Data from Web
Pages, Proc. ACM SIGMOD, pp. 337-348, 2003.
[5] D.W. Embley, Y.S. Jiang, and Y.-K. Ng, Record-Boundary Discovery in
Web Documents, Proc. ACM SIGMOD, pp. 467- 478, 1999.

Screen shot:
Home page:

Login form:

New Registrations Form:-

Fill the values:-

Details Register:-

Website search form:-

Feedback Form:

FAQ Form:-

Home page:using System;

using
using
using
using
using
using
using

System.Collections.Generic;
System.ComponentModel;
System.Data;
System.Drawing;
System.Linq;
System.Text;
System.Windows.Forms;

namespace Loopunderstanding
{
public partial class Home : Form
{
public Home()
{
InitializeComponent();
}
private void button2_Click(object sender, EventArgs e)
{
Form1 fm = new Form1();
fm.Show();
this.Hide();
}
private void button5_Click(object sender, EventArgs e)
{
MessageBox.Show("Do You Want to Close this Application");
Application.Exit();
}
private void button3_Click(object sender, EventArgs e)
{
MessageBox.Show("Please Login ");
}
}
}

Login code
using System;

using
using
using
using
using
using
using
using

System.Collections.Generic;
System.ComponentModel;
System.Data;
System.Drawing;
System.Linq;
System.Text;
System.Windows.Forms;
System.Data.SqlClient;

namespace Loopunderstanding
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
SqlConnection con = new SqlConnection("data source
=SPIRO40\\SQLEXPRESS;Initial catalog=web;integrated security=true");
SqlCommand cmd = new SqlCommand("select * from login1 where
uname='" + textBox1.Text + "'and pwd='" + textBox2.Text + "'", con);
con.Open();
SqlDataReader dr = cmd.ExecuteReader();
if (dr.Read() == true)
{
MessageBox.Show("Login SuccessFully");
windows fm = new windows();
fm.Show();
this.Hide();
}
else
{

MessageBox.Show("please correct username and password");

}
}
private void button2_Click(object sender, EventArgs e)
{
// Application.Exit();
Home pg = new Home();
pg.Show();
this.Hide();
}
private void linkLabel1_LinkClicked(object sender,
LinkLabelLinkClickedEventArgs e)
{

register rs = new register();


rs.Show();
this.Hide();

private void panel1_Paint(object sender, PaintEventArgs e)


{
}
private void panel1_click(object sender, EventArgs e)
{
register rs = new register();
rs.Show();
this.Hide();
}
private void button3_Click(object sender, EventArgs e)
{
textBox1.Text = " ";
textBox2.Text = " ";
}

Registrations code:using
using
using
using
using
using
using
using
using

System;
System.Collections.Generic;
System.ComponentModel;
System.Data;
System.Drawing;
System.Linq;
System.Text;
System.Windows.Forms;
System.Data.SqlClient;

namespace Loopunderstanding
{
public partial class register : Form
{
public register()
{
InitializeComponent();
}
SqlConnection con = new SqlConnection("data source =
SPIRO40\\SQLEXPRESS;Initial Catalog=web;integrated security=true");
private void button1_Click(object sender, EventArgs e)
{
if (textBox2.Text == textBox3.Text)
{
if (textBox1.Text != "" && textBox2.Text != "" && textBox3.Text !=
"" && textBox4.Text != "" && textBox5.Text != "" && textBox6.Text!="")
{

con.Open();
SqlCommand cmd = new SqlCommand("insert into register1
values('" + textBox1.Text + "','" + textBox2.Text + "','" + textBox3.Text +
"','" + textBox4.Text + "','" + textBox5.Text + "','"+textBox6.Text+",')",
con);
SqlCommand cmd1 = new SqlCommand("insert into login1 values
('" + textBox1.Text + "','" + textBox2.Text + "')", con);
cmd.ExecuteNonQuery();
cmd1.ExecuteNonQuery();
MessageBox.Show("Your Details Registered");
Form1 fm = new Form1();
fm.Show();
this.Hide();
con.Close();
}
else
{
MessageBox.Show("please fill the entire values");

}
}
else
{

MessageBox.Show("please correct password and confirm

Password");

}
private void button2_Click(object sender, EventArgs e)
{
//Application.Exit();
Form1 pg = new Form1();
pg.Show();
this.Show();
}
}

Website Search:
using System;

using
using
using
using
using
using
using
using
using
using
using

System.Collections.Generic;
System.ComponentModel;
System.Data;
System.Drawing;
System.Linq;
System.Text;
System.Windows.Forms;
System.Net;
System.Xml.Linq;
System.Xml;
System.IO;

namespace Loopunderstanding
{
public partial class windows : Form
{
public windows()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
string url = "http://" + textBox1.Text;
webBrowser1.Url = new Uri(url);
HttpWebRequest myWebRequest = (HttpWebRequest)HttpWebRequest.Create(url);
myWebRequest.Method = "GET";
// make request for web page
HttpWebResponse myWebResponse = (HttpWebResponse)myWebRequest.GetResponse();
StreamReader myWebSource = new
StreamReader(myWebResponse.GetResponseStream());
textBox2.Text = myWebSource.ReadToEnd();
myWebResponse.Close();

}
private void button2_Click(object sender, EventArgs e)
{
string path = "C:\\Documents and Settings\\admin\\Desktop\\ITDDM08
FULL\\CODING\\loop understanding\\Loopunderstanding\\webpage_understanding.xml";
//create the reader filestream (fs) C:\\Documents and Settings\\admin\\Desktop\\ITDDM08
FULL\\CODING\\loop understanding\\Loopunderstanding\\
FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read,
FileShare.ReadWrite);
//Create the xml document
System.Xml.XmlDocument CXML = new System.Xml.XmlDocument();
//Load the xml document
CXML.Load(fs);
//Close the fs filestream
fs.Close();
XmlElement childNode = CXML.CreateElement("Website");
XmlNode root = CXML.DocumentElement;

// create the new element (node)


XmlElement newitem = CXML.CreateElement("Details");
// Put the value (inner Text) into the node
XmlText textNode = CXML.CreateTextNode(textBox3.Text);
//Save the XML file
root.AppendChild(childNode);
childNode.SetAttribute("Name", textBox1.Text);
childNode.AppendChild(newitem);
newitem.AppendChild(textNode);
//childNode.SetAttribute("Name", textBox1.Text);
//xmlDoc.LoadXml("webpage_Understanding.xml");
FileStream WRITER = new FileStream(path, FileMode.Truncate, FileAccess.Write,
FileShare.ReadWrite);
CXML.Save(WRITER);
//Close the writer filestream
WRITER.Close();
MessageBox.Show("created");
}
private void linkLabel2_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
FAQ fq = new FAQ();
fq.Show();
this.Hide();
}
private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
Application.Exit();
}
}

Feedback code:
using
using
using
using
using
using
using
using
using

System;
System.Collections.Generic;
System.ComponentModel;
System.Data;
System.Drawing;
System.Linq;
System.Text;
System.Windows.Forms;
System.Data.SqlClient;

namespace Loopunderstanding
{
public partial class Feedback : Form
{
public Feedback()

InitializeComponent();

}
private void button1_Click(object sender, EventArgs e)
{
SqlConnection con = new SqlConnection("Data Source=IFRAME3-PC\\SQLEXPRESS;Initial
Catalog=web;Integrated Security=True");
con.Open();
SqlCommand cmd = new SqlCommand("insert into feedback values('" + textBox1.Text +
"','" + textBox2.Text + "','" + textBox3.Text + ",')", con);
cmd.ExecuteNonQuery();
MessageBox.Show("Your FeedBack successfully.Thank You");
}
private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
Application.Exit();
}
}

FAQ Code:
using
using
using
using
using
using
using
using
using

System;
System.Collections.Generic;
System.ComponentModel;
System.Data;
System.Drawing;
System.Linq;
System.Text;
System.Windows.Forms;
System.IO;

namespace Loopunderstanding
{
public partial class FAQ : Form
{
public FAQ()
{
InitializeComponent();
}
private void linkLabel1_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
Feedback fb = new Feedback();
fb.Show();
this.Hide();
}
private void linkLabel2_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
Application.Exit();
}

private void linkLabel3_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)


{
label3.Text = File.ReadAllText("\\Iframe2-pc\\d\\2013 - 2014\\Own
Concept\\Dotnet\\Diploma\\KPC\\Abarna Sampath\\ITDDM08\\ITDDM08 FULL\\CODING\\loop
understanding\\Loopunderstanding\\faq\\q1.txt").ToString();
}
private void linkLabel4_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
label4.Text = File.ReadAllText("\\Iframe2-pc\\d\\2013 - 2014\\Own
Concept\\Dotnet\\Diploma\\KPC\\Abarna Sampath\\ITDDM08\\ITDDM08 FULL\\CODING\\loop
understanding\\Loopunderstanding\\faq\\webpageunder.txt").ToString();
}
private void linkLabel5_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
label5.Text = File.ReadAllText("\\Iframe2-pc\\d\\2013 - 2014\\Own
Concept\\Dotnet\\Diploma\\KPC\\Abarna Sampath\\ITDDM08\\ITDDM08 FULL\\CODING\\loop
understanding\\Loopunderstanding\\faq\\nlp.txt").ToString();
}
private void linkLabel6_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
label6.Text = File.ReadAllText("\\Iframe2-pc\\d\\2013 - 2014\\Own
Concept\\Dotnet\\Diploma\\KPC\\Abarna Sampath\\ITDDM08\\ITDDM08 FULL\\CODING\\loop
understanding\\Loopunderstanding\\faq\\IE.txt").ToString();
}
private void linkLabel7_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
label7.Text = File.ReadAllText("\\Iframe2-pc\\d\\2013 - 2014\\Own
Concept\\Dotnet\\Diploma\\KPC\\Abarna Sampath\\ITDDM08\\ITDDM08 FULL\\CODING\\loop
understanding\\Loopunderstanding\\faq\\html convert xml.txt").ToString();
}
private void linkLabel8_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
label8.Text = File.ReadAllText("\\Iframe2-pc\\d\\2013 - 2014\\Own
Concept\\Dotnet\\Diploma\\KPC\\Abarna Sampath\\ITDDM08\\ITDDM08 FULL\\CODING\\loop
understanding\\Loopunderstanding\\faq\\purpose.txt").ToString();
}
private void linkLabel9_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
label9.Text = File.ReadAllText("\\Iframe2-pc\\d\\2013 - 2014\\Own
Concept\\Dotnet\\Diploma\\KPC\\Abarna Sampath\\ITDDM08\\ITDDM08 FULL\\CODING\\loop
understanding\\Loopunderstanding\\faq\\source code.txt").ToString();
}
private void linkLabel11_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
label11.Text = File.ReadAllText("\\Iframe2-pc\\d\\2013 - 2014\\Own
Concept\\Dotnet\\Diploma\\KPC\\Abarna Sampath\\ITDDM08\\ITDDM08 FULL\\CODING\\loop
understanding\\Loopunderstanding\\faq\\technique.txt").ToString();

}
private void linkLabel12_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
label12.Text = File.ReadAllText("\\Iframe2-pc\\d\\2013 - 2014\\Own
Concept\\Dotnet\\Diploma\\KPC\\Abarna Sampath\\ITDDM08\\ITDDM08 FULL\\CODING\\loop
understanding\\Loopunderstanding\\faq\\kind.txt").ToString();
}
private void linkLabel10_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
label10.Text = File.ReadAllText("\\Iframe2-pc\\d\\2013 - 2014\\Own
Concept\\Dotnet\\Diploma\\KPC\\Abarna Sampath\\ITDDM08\\ITDDM08 FULL\\CODING\\loop
understanding\\Loopunderstanding\\faq\\uses webpge.txt").ToString();
}
}

You might also like