You are on page 1of 78

FiVaTech Page-Level Web Data Extraction from Template Pages

CHAPTER - 1
INTRODUCTION
1.1 Scope:
This Document plays a vital role in the development life cycle (SDLC) and it
describes the complete requirement of the system. It is meant for use by the
developers and will be the basic durin testin phase. !ny chanes made to the
requirements in the future will have to o throuh formal chane approval process.
1.2 Objective:
In this pro"ect# we focus on pae$level e%traction tas&s and propose a new
approach# called 'i(aTech# to automatically detect the schema of a )ebsite. )e
formulate the pae eneration model usin an encodin scheme based on tree
templates and schema# which orani*e data by their parent node in the D+, trees.
1.3 Descriptio o! t"e Project:
Deep )eb# as is &nown to everyone# contains manitudes more and valuable
information than the surface )eb. -owever# ma&in use of such consolidated
information requires substantial efforts since the paes are enerated for visuali*ation
not for data e%chane. Thus# e%tractin information from )eb paes for searchable
)ebsites has been a &ey step for )eb information interation. .eneratin an
e%traction proram for a iven search form is equivalent to wrappin a data source
such that all e%tractor or wrapper prorams return data of the same format for
information interation. !n important characteristic of paes belonin to the same
)ebsite is that such paes share the same template since they are encoded in a
consistent manner across all the paes. In other words# these paes are enerated with
a predefined template by pluin data values. In practice# template paes can also
occur in surface )eb (with static hyperlin&s).
In this paper# we focus on pae$level e%traction tas&s and propose a new
approach# called 'i(aTech# to automatically detect the schema of a )ebsite. The
proposed technique presents a new structure# called fi%ed/variant pattern tree# a tree
MCA, MITS, 2012
1
FiVaTech Page-Level Web Data Extraction from Template Pages
that carries all of the required information needed to identify the template and detect
the data schema. )e combine several techniques0 alinment# pattern minin# as well
as the idea of tree templates to solve the much difficult problem of pae$level
template construction. In e%periments# 'i(a Tech has much hiher precision than
12!L.# one of the few pae$level e%traction system# and is comparable with other
record$level e%traction systems li&e (i314 and ,S1.
1.# Itro$%ctio to &o$%'es:
The followin ,odules Involved in this pro"ect
5. Loin
6. !dmin
7. 8ser
,odules description0
1. (o)i:
In this module admin can ive the user id and password and he can enter into
the home pae of the admin. If they cannot ive proper user id and password
they will not allow the home pae.
2. A$*i:
!dmin can ive their username and password and loin to the application.
!dmin can add the products based on the cateory. !s well as he can view the
users feedbac&.
3. User:
In this modules user view the products details. !fterwards he can e%tract the
details based on the requirement. -ere we will display multiple products in
sinle after that it will e%tract the data in that sinle pae only.
MCA, MITS, 2012
2
FiVaTech Page-Level Web Data Extraction from Template Pages
CHAPTER -2
S+STE& ANA(+SIS
2.1 INTRODUCTION
2.1.1. ,e-sibi'it. St%$.
3reliminary investiation e%amine pro"ect feasibility# the li&elihood the system
will be useful to the orani*ation. The main ob"ective of the feasibility study is to test
the Technical# +perational and 1conomical feasibility for addin new modules and
debuin old runnin system. !ll system is feasible if they are unlimited resources
and infinite time. There are aspects in the feasibility study portion of the preliminary
investiation0
5. Technical 'easibility
6. +perational 'easibility
7. 1conomical 'easibility
2.1.1.1. Tec"ic-' !e-sibi'it.
The technical issue usually raised durin the feasibility stae of the
investiation includes the followin0
Does the necessary technoloy e%ist to do what is suested9
Do the proposed equipments have the technical capacity to hold the data
required to use the new system9
)ill the proposed system provide adequate response to inquiries# reardless of
the number or location of users9
Can the system be upraded if developed9
!re there technical uarantees of accuracy# reliability# ease of access and data
security9
1arlier no system e%isted to cater to the needs of :Secure Infrastructure
Implementation System;. The current system developed is technically feasible. It is a
MCA, MITS, 2012
3
FiVaTech Page-Level Web Data Extraction from Template Pages
web based user interface for audit wor&flow at <IC$CSD. Thus it provides an easy
access to the users. The database;s purpose is to create# establish and maintain a
wor&flow amon various entities in order to facilitate all concerned users in their
various capacities or roles. 3ermission to the users would be ranted based on the
roles specified. Therefore# it provides the technical uarantee of accuracy#
reliability and security. The software and hard requirements for the development of
this pro"ect are not many and are already available in$house at <IC or are available as
free as open source. The wor& for the pro"ect is done with the current equipment and
e%istin software technoloy. <ecessary bandwidth e%ists for providin a fast
feedbac& to the users irrespective of the number of users usin the system.
2.1.1.2. Oper-tio-' !e-sibi'it.
3roposed pro"ects are beneficial only if they can be turned out into information
system. That will meet the orani*ation;s operatin requirements. +perational
feasibility aspects of the pro"ect are to be ta&en as an important part of the pro"ect
implementation. Some of the important issues raised are to test the operational
feasibility of a pro"ect includes the followin0 $
Is there sufficient support for the manaement from the users9
)ill the system be used and wor& properly if it is bein developed and
implemented9
)ill there be any resistance from the user that will undermine the possible
application benefits9
This system is tareted to be in accordance with the above$mentioned issues.
=eforehand# the manaement issues and user requirements have been ta&en into
consideration. So there is no question of resistance from the users that can undermine
the possible application benefits.
The well$planned desin would ensure the optimal utili*ation of the computer
resources and would help in the improvement of performance status.
2.1.1.3. Ecoo*ic-' ,e-sibi'it.
! system can be developed technically and that will be used if installed must
still be a ood investment for the orani*ation. In the economical feasibility# the
MCA, MITS, 2012
4
FiVaTech Page-Level Web Data Extraction from Template Pages
development cost in creatin the system is evaluated aainst the ultimate benefit
derived from the new systems. 'inancial benefits must equal or e%ceed the costs.
The system is economically feasible. It does not require any addition hardware
or software. Since the interface for this system is developed usin the e%istin
resources and technoloies available at <IC# There is nominal e%penditure and
economical feasibility for certain.
2.2. E/isti) S.ste*
The e%istin paper not automatically detects the schema of a )ebsite. !nd it is
a fi%ed pattern tree. Template construction is difficult in e%istin system. .enerally
spea&in# templates# as a common model for all paes# occur quite fi%ed as opposed to
data values which vary across paes. 'indin such a common template requires
multiple paes or a sinle pae containin multiple records as input. )hen multiple
paes are iven# the e%traction taret aims at pae$wide information (e..# 4oad
4unner and 12!L.). )hen sinle paes are iven# the e%traction taret is usually
constrained to record wide information# which involves the addition issue of record$
boundary detection.
So'%tio O! T"ese Prob'e*s
The proposed technique presents a new structure# called fi%ed/variant pattern
tree# a tree that carries all of the required information needed to identify the template
and detect the data schema. )e combine several techniques0 alinment# pattern
minin# as
)ell as the idea of tree templates to solve the much difficult problem of pae$
level template construction. In e%periments# 'i(aTech has much hiher precision than
12!L.# one of the few pae$level e%traction systems# and is comparable with other
record$level e%traction systems li&e (iper and ,S1.
2.3 Propose$ S.ste*
In this paper# we focus on pae$level e%traction tas&s and propose a new
approach# called 'i(aTech# to automatically detect the schema of a )ebsite. The
proposed technique presents a new structure# called fi%ed/variant pattern tree# a tree
that carries all of the required information needed to identify the template and detect
MCA, MITS, 2012
5
FiVaTech Page-Level Web Data Extraction from Template Pages
the data schema. )e combine several techniques0 alinment# pattern minin# as well
as the idea of tree templates to solve the much difficult problem of pae$level
template construction. In e%periments# 'i(a Tech has much hiher precision than
12!L.# one of the few pae$level e%traction system# and is comparable with other
record$level e%traction systems li&e (i314 and ,S1.
2.# So!t0-re -$ H-r$0-re Re1%ire*et Speci!ic-tios
So!t0-re Re1%ire*ets:
(isual studio 6>5>.
Sql server 6>>?.
windows %p operatin system.
Internet 1%plorer @.> browser.
H-r$0-re Re1%ire*ets:
3I( 6.? .-* 3rocessor and !bove
4!, 5 .= and !bove
-DD A> .= -ard Dis& Space and !bove
MCA, MITS, 2012
6
FiVaTech Page-Level Web Data Extraction from Template Pages
CHAPTER - 3
S+STE& DESI2N
3.1. INTRODUCTION
Software desin sits at the technical &ernel of the software enineerin process
and is applied reardless of the development paradim and area of application. Desin
is the first step in the development phase for any enineered product or system. The
desiner;s oal is to produce a model or representation of an entity that will later be
built. =einnin# once system requirement have been specified and analy*ed# system
desin is the first of the three technical activities $desin# code and test that is required
to build and verify software.
The importance can be stated with a sinle word BCualityD. Desin is the place
where quality is fostered in software development. Desin provides us with
representations of software that can assess for quality. Desin is the only way that we
can accurately translate a customer;s view into a finished software product or system.
Software desin serves as a foundation for all the software enineerin steps that
follow. )ithout a stron desin we ris& buildin an unstable system E one that will be
difficult to test# one whose quality cannot be assessed until the last stae.
Durin desin# proressive refinement of data structure# proram structure#
and procedural details are developed reviewed and documented. System desin can be
viewed from either technical or pro"ect manaement perspective. 'rom the technical
point of view# desin is comprised of four activities E architectural desin# data
structure desin# interface desin and procedural desin.
3.2. Desi) Pricip'es:
=asic desin principles that enabled the software enineered to naviate
desin process
5. The desin process should not suffer from BTunnel (isionD.
6. The desin process should be traceable to the analysis model.
7. The desin should not reinvent the wheel.
MCA, MITS, 2012
7
FiVaTech Page-Level Web Data Extraction from Template Pages
A. The desin should minimi*e the intellectual distance between the software
and the problem# as it e%ists in the real world.
F. The desin should e%hibit uniformity and interity.
@. The desin should be structured to accommodate chanes.
G. The desin is not codin# the codin is not a desin.
3.3. Desi) &et"o$o'o).:
Desin methodoloy follows two approaches i.e. Top-$o0 and botto*-%p
approach. Top$down and bottom$up are strategies of information processing and
&nowlede orderin# mostly involvin software. In practice# they can be seen as a
style of thin&in and teachin. In many cases top$down is used as a synonym of
analysis or decomposition# and bottom$up of synthesis.
! top-$o0 approach is essentially brea&in down a system to ain insiht into
its compositional sub$systems. In a top$down approach an overview of the system is
first formulated# specifyin but not detailin any first$level subsystems. 1ach
subsystem is then refined in yet reater detail# sometimes in many additional
subsystem levels# until the entire specification is reduced to base elements.
! botto*-%p approach is piecin toether systems to ive rise to rander
systems# thus ma&in the oriinal systems sub$systems of the emerent system. In a
bottom$up approach the individual base elements of the system are first specified in
reat detail. These elements are then lin&ed toether to form larer subsystems# which
then in turn are lin&ed# sometimes in many levels# until a complete top$level system is
formed. This stratey often resembles a HseedH model# whereby the beinnins are
small but eventually row in comple%ity and completeness.
SD(C *et"o$o'o)ies
This document play a vital role in the development of life cycle (SDLC) as it
describes the complete requirement of the system. It means for use by developers and
will be the basic durin testin phase. !ny chanes made to the requirements in the
future will have to o throuh formal chane approval process.
MCA, MITS, 2012
8
FiVaTech Page-Level Web Data Extraction from Template Pages
Spir-' *o$e':
This model was defined by =arry =oehm in his 5I?? article# B! spiral ,odel
of Software Development and 1nhancement. This model was not the first model to
discuss iterative development# but it was the first model to e%plain why the iteration
models.
!s oriinally envisioned# the iterations were typically @ months to 6 years
lon. 1ach phase starts with a desin oal and ends with a client reviewin the
proress thus far. !nalysis and enineerin efforts are applied at each phase of the
pro"ect# with an eye toward the end oal of the pro"ect.
T"e !o''o0i) $i-)r-* s"o0s "o0 - spir-' *o$e' -cts 'i3e:
,i) 3.3.2. Spir-' *o$e'
The steps for Spiral ,odel can be enerali*ed as follows0
The new system requirements are defined in as much details as possible.
This usually involves interviewin a number of users representin all the
e%ternal or internal users and other aspects of the e%istin system.
! preliminary desin is created for the new system.
MCA, MITS, 2012
9
FiVaTech Page-Level Web Data Extraction from Template Pages
! first prototype of the new system is constructed from the preliminary
desin. This is usually a scaled$down system# and represents an
appro%imation of the characteristics of the final product.
! second prototype is evolved by a fourfold procedure0
5. 1valuatin the first prototype in terms of its strenths# wea&ness#
and ris&s.
6. Definin the requirements of the second prototype.
7. 3lannin an desinin the second prototype.
A. Constructin and testin the second prototype.
!t the customer option# the entire pro"ect can be aborted if the ris& is
deemed too reat. 4is& factors miht involve development cost overruns#
operatin$cost miscalculation# or any other factor that could# in the
customer;s "udment# result in a less$than$satisfactory final product.
The e%istin prototype is evaluated in the same manner as was the
previous prototype# and if necessary# another prototype is developed from
it accordin to the fourfold procedure outlined above.
The precedin steps are iterated until the customer is satisfied that the
refined prototype represents the final product desired.
The final system is constructed# based on the refined prototype.
The final system is thorouhly evaluated and tested. 4outine maintenance
is carried on a continuin basis to prevent lare scale failures and to
minimi*e down time.
3.# D,D4ER4U&( DIA2RA&S
3.#.1. DATA ,(O5 DIA2RA&S
MCA, MITS, 2012
10
FiVaTech Page-Level Web Data Extraction from Template Pages
! data flow diaram is raphical tool used to describe and analy*e movement
of data throuh a system. These are the central tool and the basis from which the
other components are developed. The transformation of data from input to output#
throuh processed# may be described loically and independently of physical
components associated with the system. These are &nown as the loical data flow
diarams. The physical data flow diarams show the actual implements and
movement of data between people# departments and wor&stations. ! full description
of a system actually consists of a set of data flow diarams. 8sin two familiar
notations Jourdon# .ane and Sarson notation develops the data flow diarams. 1ach
component in a D'D is labeled with a descriptive name. 3rocess is further identified
with a number that will be used for identification purpose. The development of
D'D;S is done in several levels. 1ach process in lower level diarams can be bro&en
down into a more detailed D'D in the ne%t level. The lop$level diaram is often
called conte%t diaram. It consists a sinle process bit# which plays vital role in
studyin the current system. The process in the conte%t level diaram is e%ploded
into other process at the first level D'D.
The idea behind the e%plosion of a process into more process is that
understandin at one level of detail is e%ploded into reater detail at the ne%t level.
This is done until further e%plosion is necessary and an adequate amount of detail is
described for analyst to understand the process.
Larry Constantine first developed the D'D as a way of e%pressin system
requirements in a raphical from# this lead to the modular desin.
! D'D is also &nown as a Bbubble ChartD has the purpose of clarifyin system
requirements and identifyin ma"or transformations that will become prorams in
system desin. So it is the startin point of the desin to the lowest level of detail. !
D'D consists of a series of bubbles "oined by data flows in the system.
D,D s.*bo's:
In the D'D# there are four symbols
MCA, MITS, 2012
11
FiVaTech Page-Level Web Data Extraction from Template Pages
5. ! square defines a source(oriinator) or destination of system data
6. !n arrow identifies data flow. It is the pipeline throuh which the information
flows
7. ! circle or a bubble represents a process that transforms incomin data flow into
outoin data flows.
A. !n open rectanle is a data store# data at rest or a temporary repository of data
3rocess that transforms data flow.
Source or Destination of data
Data flow
Data Store
3.#.1.1. Costr%cti) - D,D:
Several rules of thumb are used in drawin D'D;S0
5. 3rocess should be named and numbered for an easy reference. 1ach name should
be representative of the process.
6. The direction of flow is from top to bottom and from left to riht. Data
traditionally flow from source to the destination althouh they may flow bac& to
the source. +ne way to indicate this is to draw lon flow line bac& to a source.
MCA, MITS, 2012
12
FiVaTech Page-Level Web Data Extraction from Template Pages
!n alternative way is to repeat the source symbol as a destination. Since it is used
more than once in the D'D it is mar&ed with a short diaonal.
7. )hen a process is e%ploded into lower level details# they are numbered.
A. The names of data stores and destinations are written in capital letters. 3rocess and
dataflow names have the first letter of each wor& capitali*ed
! D'D typically shows the minimum contents of data store. 1ach data store
should contain all the data elements that flow in and out.
Cuestionnaires should contain all the data elements that flow in and out.
,issin interfaces redundancies and li&e is then accounted for often throuh
interviews.
3.#.1.2. S-'iet ,e-t%res O! D,D6s
5. The D'D shows flow of data# not of control loops and decision are controlled
considerations do not appear on a D'D.
6. The D'D does not indicate the time factor involved in any process whether the
dataflow ta&e place daily# wee&ly# monthly or yearly.
7. The sequence of events is not brouht out on the D'D.
3.#.1.3. T.pes O! D-t- ,'o0 Di-)r-*s
5. Current 3hysical
6. Current Loical
7. <ew Loical
A. <ew 3hysical
1. C%rret p".sic-':
In Current 3hysical D'D process label include the name of people or their
positions or the names of computer systems that miht provide some of the overall
system$processin label includes an identification of the technoloy used to process
the data. Similarly data flows and data stores are often labels with the names of the
actual physical media on which data are stored such as file folders# computer files#
business forms or computer tapes.
MCA, MITS, 2012
13
FiVaTech Page-Level Web Data Extraction from Template Pages
2. C%rret 'o)ic-':
The physical aspects at the system are removed as mush as possible so that the
current system is reduced to its essence to the data and the processors that transforms
them reardless of actual physical form.
3. Ne0 'o)ic-'0
This is e%actly li&e a current loical model if the user were completely happy
with he user were completely happy with the functionality of the current system but
had problems with how it was implemented typically throuh the new loical model
will differ from current loical model while havin additional functions# absolute
function removal and inefficient flows reconi*ed.
#. Ne0 p".sic-':
The new physical represents only the physical implementation of the new
system.
3.#.1.3. R%'es 2overi) t"e D,D6s
Process
5. <o process can have only outputs.
6. <o process can have only inputs. If an ob"ect has only inputs than it must be a
sin&.
7. ! process has a verb phrase label.
D-t- Store
5. Data cannot move directly from one data store to another data store# a process
must move data.
6. Data cannot move directly from an outside source to a data store# a process#
which receives# must move data from the source and place the data into data
store
7. ! data store has a noun phrase label.
So%rce Or Si3
MCA, MITS, 2012
14
FiVaTech Page-Level Web Data Extraction from Template Pages
The oriin and / or destination of data.
5. Data cannot move direly from a source to sin& it must be moved by a process
6. ! source and /or sin& has a noun phrase land
D-t- ,'o0
5. ! Data 'low has only one direction of flow between symbols. It may flow in
both directions between a process and a data store to show a read before an
update. The later is usually indicated however by two separate arrows since
these happen at different type.
6. ! "oin in D'D means that e%actly the same data comes from any of two or
more different processes data store or sin& to a common location.
7. ! data flow cannot o directly bac& to the same process it leads. There must
be atleast one other process that handles the data flow produce some other data
flow returns the oriinal data into the beinnin process.
A. ! Data flow to a data store means update (delete or chane).
F. ! data 'low from a data store means retrieve or use.
! data flow has a noun phrase label more than one data flow noun phrase can
appear on a sinle arrow as lon as all of the flows on the same arrow move toether
as one pac&ae.
Cote/t (eve' 78
t"
(eve'9 D,D
Descriptio: This is parent level of remainin D'D;s. It shows that how the data is
processin from input to output.
MCA, MITS, 2012
15
FiVaTech Page-Level Web Data Extraction from Template Pages
,i) 3.#.1. Cote/t 'eve' D-t- ,'o0 Di-)r-*
,irst 'eve' D,D:
It is drawn from conte%t level# this D'D describes that user enter into website
with his credentials.
,i) 3.#.2. ,irst 'eve' D,D
Seco$ (eve' D,D:
It is drawn from conte%t level# this D'D describes that !dministrator enter
into website with his credentials.
MCA, MITS, 2012
16
FiVaTech Page-Level Web Data Extraction from Template Pages
,i) 3.#.3.seco$ 'eve' D,D
3.#.2. ER Di-)r-*s
The relation upon the system is structure throuh a conceptual 14$
Diaram# which not only specifics the e%istential entities but also the standard
relations throuh which the system e%ists and the cardinalities that are necessary
for the system state to continue.
The entity 4elationship Diaram (14D) depicts the relationship between the data
ob"ects. The 14D is the notation that is used to conduct the date modelin activity
the attributes of each data ob"ect noted is the 14D can be described resin a data
ob"ect descriptions.
The set of primary components that are identified by the 14D are
Data ob"ect 4elationships
!ttributes (arious types of indicators.
The primary purpose of the 14D is to represent data ob"ects and their relationships.
3.#.2. U&( $i-)r-*s
The 8nified ,odelin Lanuae (8,L) is used to specify# visuali*e# modify#
construct and document the artifacts of an ob"ect$oriented software intensive system
MCA, MITS, 2012
17
FiVaTech Page-Level Web Data Extraction from Template Pages
under development. The 8,L uses mostly raphical notations to e%press the desin
of software pro"ects. 8,L offers a standard way to visuali*e a systemKs architectural
blueprints# includin elements such as0
,i) 3.#.#.Over vie0 o! Desi)
actors
business processes
(loical) components
activities
prorammin lanuae statements
database schemas# and
4eusable software components.
U&( Di-)r-*s Overvie0:
MCA, MITS, 2012
18
FiVaTech Page-Level Web Data Extraction from Template Pages
8,L combines best techniques from data modelin (entity relationship
diarams)# business modelin (wor& flows)# ob"ect modelin# and component
modelin. It can be used with all processes# throuhout the software development life
cycle# and across different implementation technoloies
.
8,L has synthesi*ed the
notations of the =ooch method# the +b"ect$modelin technique (+,T) and +b"ect$
oriented software enineerin (++S1) by fusin them into a sinle# common and
widely usable modelin lanuae. 8,L aims to be a standard modelin lanuae
which can model concurrent and distributed systems.
3.#.2.1 T"i)s i U&(

Thins are the abstractions that are first$class citi*ens in a model. 4elationships
tie these thins toether. Diarams roup the interestin collection of thins. There
are four &inds of thins in the 8,L
5. Structural thins
6. =ehavioral thins.
7. .roupin thins
A. !nnotational thins
These thins are the basic ob"ect oriented buildin bloc&s of the 8,L. They
are used to write well$formed models.
1. Str%ct%r-' T"i)s
Structural thins are the nouns of the 8,L models. These are mostly static
parts of the model# representin elements that are either conceptual or physical. In all#
there are seven &inds of Structural thins.
-9 C'-ss
! class is a description of a set of ob"ects that share the same attributes#
operations# relationships# and semantics. ! class implements one or more interfaces.
.raphically a class is rendered as a rectanle# usually includin its name# attributes
and operations# as shown below.
MCA, MITS, 2012
19
FiVaTech Page-Level Web Data Extraction from Template Pages
Window
origin
Size
Open()
Close()
Display()
b9 Iter!-ce
!n interface is a collection of operations that specify a service of a class or
component. !n interface describes the e%ternally visible behavior of that element.
.raphically the interface is rendered as a circle toether with its name.

c9 Co''-bor-tio
Collaboration defines an interaction and is a society of roles and other
elements that wor& toether to provide some cooperative behavior that;s bier than
the sum of all the elements. .raphically# collaboration is rendered as an ellipse with
dashed lines# usually includin only its name as shown below.
$9 Use C-se
8se case is a description of a set of sequence of actions that a system performs
that yields an observable result of value to a particular thins in a model. .raphically#
8se Case is rendered as an ellipse with dashed lines# usually includin only its
name as shown below.
e9 Co*poet
Component is a physical and replaceable part of a system that conforms to and
provides the reali*ation of a set of interfaces. .raphically# a component is rendered as
a rectanle with tabs# usually includin only its name# as shown below.
MCA, MITS, 2012
20
Chain of
Responsibility
3lace +rder
FiVaTech Page-Level Web Data Extraction from Template Pages

orderform.java
!9 No$e
! <ode is a physical element that e%ists at run time and represents a
computational resource# enerally havin at least some memory and often# processin
capability. .raphically# a node is rendered as a cube# usually includin only its name#
as shown below.

server
2. :e"-vior-' T"i)s
=ehavioral Thins are the dynamic parts of 8,L models. These are the verbs
of a model# representin behavior over time and space.
-9 Iter-ctio
!n interaction is a behavior that comprises a set of messaes e%chaned
amon a set of ob"ects within a particular conte%t to accomplish a specific purpose.
.raphically# a messae is rendered as a direct line# almost always includin the name
if its operation# as shown below.
Display
b9 St-te &-c"ie
! state machine is a behavior that specifies the sequence of states an ob"ect or
an interaction oes throuh durin its lifetime on response to events# toether with its
responses to those events. .raphically# a state is rendered as rounded rectanle usually
includin its name and its sub$states# if any# as shown below.
MCA, MITS, 2012
21
FiVaTech Page-Level Web Data Extraction from Template Pages
3. 2ro%pi) T"i)s
.roupin thins are the orani*ational parts of the 8,L models. These are the
bo%es into which a model can be decomposed.
P-c3-)e
! pac&ae is a eneral$purpose mechanism for orani*in elements into roups.

Business Rules
#. Aot-tio-' T"i)s
!nnotational thins are the e%planatory parts of the 8,L models.
Note
! note is simply a symbol for renderin constraints and comments attached to
an element or a collection of elements. .raphically a note is rendered as a rectanle
with do$eared corner toether# with a te%tual or raphical comment# as shown below.
3.#.2.2. Re'-tios"ips i t"e U&(
The word B<otationD in 8,L refers to set of symbols# which are used to
represent a system. These symbols play a vital role in definin a system. =ased on
these notations# 8,L defines the four relationships.
5. Dependency
6. !ssociation
7. .enerali*ation
A. 4eali*ation
MCA, MITS, 2012
22
)aitin
FiVaTech Page-Level Web Data Extraction from Template Pages
1. Depe$ec.:
The relationship BDependencyD between two entities refer to position where
chanes caused to one entity may have its effect on other entity. The dependency
relationship is represented as#
!s seen from the fiure the dependency symbol is represented by a dashed
arrow proceedin in one direction.
2. Associ-tio:
! structural relationship that shows a connection amon ob"ects is called as an
B!ssociationD. It is represented as#

3. 2eer-'i;-tio:
.enerali*ation is termed as BSpeciali*ed 4elationshipD. In this relationship#
the ob"ects of one entity can be substituted with the ob"ects of another entity. The
entity whose ob"ects are substituted is &nown as parent entity and the entity# which is
providin ob"ects for replacement# is &nown as child entity. It is represented as#
#. Re-'i;-tio:
4eali*ation is a relationship between classifiers in which one classifier lays
down a contract and another classifier uarantees to carry out this contract.
3.#.2.3. Di-)r-*s i t"e U&(
MCA, MITS, 2012
23
FiVaTech Page-Level Web Data Extraction from Template Pages
Str%ct%r-' Di-)r-*s:
The Structural Diarams are four types. They are as follows.
a. Class diarams
b. +b"ect diarams
c. Component Diarams
d. Deployment Diarams
-9 C'-ss Di-)r-*s
Class diarams are the most common diarams found in modelin ob"ect$
oriented systems. ! class diaram shows a set of classes# interfaces# and
collaborations and their relationships. .raphically# a class diaram is a collection of
vertices and arcs.
Cotets:
Class Diarams commonly contain the followin thins
Classes
Interfaces
Collaborations
Dependency# enerali*ation and association relationships
b9 Object $i-)r-*s
)henever to encounter a iven set of ob"ects bounded by certain relationships
then all these elements collaborates to be an ob"ect diaram. These diarams are used
in modelin static desin view and process view of the system and also used in
modelin the orani*ation of ob"ects.
Cotets:
+b"ect Diaram consists of two important elements i.e.
+b"ects
4elationships
c9 Co*poet Di-)r-*s
MCA, MITS, 2012
24
FiVaTech Page-Level Web Data Extraction from Template Pages
! component is the physical implementation of classes and collaborations.
!rchitecture of a system can be e%plained with its components.
Therefore a component is the basic buildin bloc& of a system. These diaram
scan be achieved by modelin various physical components li&e libraries# tables# files
etc. which are residin internal to iven node.
Cotets:
Components
Interfaces
4elationships
$9 Dep'o.*et Di-)r-*s
The deployment diarams indicate the processin elements# processes#
software components. The static deployment view of a system in terms of different
components# processes can be modeled by deployment diarams.
! deployment diaram contains$nodes and relationships (dependency and
association). This diaram is used to &now which components will run on which
nodes (with the stereo typeLLsupportsMM) similarly the miration of components will
be represented by the stereo typeLLbecomesMM.
Cotets:
<odes
4elationships
:e"-vior-' Di-)r-*s:
The behavioral Diarams are four types. They are as follows.
8se case diarams
Sequence diarams
Collaboration diarams
!ctivity diarams
State chart diarams
-9 Use C-se Di-)r-*s
MCA, MITS, 2012
25
FiVaTech Page-Level Web Data Extraction from Template Pages
8se Case diarams are one of the five diarams in the 8,L for modelin the
dynamic aspects of systems (activity diarams# sequence diarams# state chart
diarams and collaboration diarams are the four other &inds of diarams in the 8,L
for modelin the dynamic aspects of systems). 8se Case diarams are central to
modelin the behavior of the system# a sub$system# or a class. 1ach one shows a set of
use cases and actors and relationships.
Co**o Properties
! 8se Case diaram is "ust a special &ind of diaram and shares the same
common properties# as do all other diarams$ a name and raphical contents that are a
pro"ection into the model. )hat distinuishes a use case diaram from all other &inds
of diarams is its particular content.
Cotets:
8se Case diarams commonly contain0
8se Cases
!ctors
Dependency# enerali*ation# and association relationships
Li&e all other diarams# use case diarams may contain notes and constraints.
8se Case diarams may also contain pac&aes# which are used to roup elements of
your model into larer chun&s. +ccasionally# you will want to place instances of use
cases in your diarams# as well# especially when you want to visuali*e a specific
e%ecutin system.
b9 Se1%ece Di-)r-*s
! sequence diaram is an interaction diaram that emphasi*es the time
orderin of the messaes. .raphically# a sequence diaram is a table that shows
ob"ects arraned alon the 2$a%is and messaes# ordered in increasin time# alon the
J$a%is.
Typically you place the ob"ect that initiates the interaction at the left# and
increasinly more sub$routine ob"ects to the riht. <e%t# you place the messaes that
these ob"ects send and receive alon the J$a%is# in order of increasin time from top
to the bottom. This ives the reader a clear visual cue to the flow of control over time.
MCA, MITS, 2012
26
FiVaTech Page-Level Web Data Extraction from Template Pages
Sequence diarams have two interestin features
There is the ob"ect lifeline. !n ob"ect lifeline is the vertical dashed line that
represents the e%istence of an ob"ect over a period of time. ,ost ob"ects that appear in
the interaction diarams will be in e%istence for the duration of the interaction# so
these ob"ects are all alined at the top of the diaram# with their lifelines drawn from
the top of the diaram to the bottom.
There is a focus of the control. The focus of control is tall# thin rectanle that
shows the period of time durin which an ob"ect is performin an action# either
directly or throuh the subordinate procedure. The top of the rectanle is alins with
the actionN the bottom is alined with its completion.
c9 Co''-bor-tio Di-)r-*s
Collaboration Diarams remains analoous with sequence diarams since
these diarams encompasses various ob"ects# there lin&s alon with transmission
/receivin of messaes. In this way they coordinate to structural aspects of the system
(which also provides dynamic view of the system).
The collaboration diaram contains set of ob"ectsN lin&s and the messaes send and
received by them.
$9 Activit. Di-)r-*s
!n !ctivity Diaram is essentially a flow chart showin flow of control from
activity to activity. They are used to model the dynamic aspects of as system. They
can also be used to model the flow of an ob"ect as it moves from state to state at
different points in the flow of control.
!n activity is an onoin non$atomic e%ecution with in a state machine.
!ctivities ultimately result in some action# which is made up of e%ecutable atomic
computations that result in a chane of state of distinuishes a use case diaram from
all other &inds of diarams is its particular content.
e9 St-te C"-rt Di-)r-*s
MCA, MITS, 2012
27
FiVaTech Page-Level Web Data Extraction from Template Pages
! state chart diaram shows a state machine. State chart diarams are used to
model the dynamic aspects of the system. 'or the most part this involves modelin the
behavior of the reactive ob"ects.
! reactive ob"ect is one whose behavior is best characteri*ed by its response to
events dispatched from outside its conte%t. ! reactive ob"ect has a clear lifeline whose
current behavior is affected by its past.
! state chart diaram show a state machine emphasi*in the flow of control
from state to state. ! state machine is a behavior that specifies the sequence of states
an ob"ect oes throuh durin its lifetime in response to events toether with its
response to those events.
! state is a condition in the life of the ob"ect durin which it satisfies some
conditions# performs some activity or wait for some events. !n event is a specification
of a sinificant occurrence that has a location in time and space. .raphically a state
chart diaram is a collection of vertices and arcs. State chart diaram commonly
contain.
C'-ss Di-)r-*:
MCA, MITS, 2012
28
FiVaTech Page-Level Web Data Extraction from Template Pages
In this class diaram we establish the connection between classes. In this
pro"ect the below thins are user defined classes.
,i) 3.#.2.3.# c'-ss $i-)r-*
Use C-se $i-)r-*s:
MCA, MITS, 2012
29
FiVaTech Page-Level Web Data Extraction from Template Pages
It is describes that both user and administrator functionalities in pro"ect.
,i) 3.#.2.3.< over vie0 Use C-se Di-)r-*
Se1%ece Di-)r-*
MCA, MITS, 2012
30
System
Admin
User
Registration
Login
Add New Topic
Add Data
Search
Logout
FiVaTech Page-Level Web Data Extraction from Template Pages
It describes that how the process done sequentially front end to data base.
Admin frmogin
BA Cls!rodu"#s DA S$%elper
Da#a&ase
' ( )n#er Creden#ials()
* ( ogin()
+ ( ),e"u#eDa#ase#()
- ( Re.ues# of ),e"u#eDa#ase#()
/ ( Response of ),e"u#eDa#ase#()
0 ( Resul#()
1 ( S2ow resul#()
,i) 3.#.2.3.=. Se1%ece Di-)r-* over >ie0
MCA, MITS, 2012
31
FiVaTech Page-Level Web Data Extraction from Template Pages
Activit. Di-)r-*
It is describes that user reistration activities.

,i) 3.#.2.3.?. Activit. $i-)r-* e/-*p'e
MCA, MITS, 2012
32
Get the Details
Valiate Data
Invalid
!""ept
#nte$ %se$ Re&ist$ation Details
S'bmit
(o
)es
Ret'$ns #$$o$ *essa&e
S'""essf'lly Re&iste$e
FiVaTech Page-Level Web Data Extraction from Template Pages
Co''-bor-tio Di-)r-*
It is describes that what activities are perform in the !dmin reistration.
Admin
frmogin
BA Cls!rodu"#s
DA S$%elper
Da#a&ase
' ( )n#er Creden#ials()
* ( ogin()
+ ( ),e"u#eDa#ase#()
- ( Re.ues# of ),e"u#eDa#ase#()
/ ( Response of ),e"u#eDa#ase#()
0 ( Resul#()
1 ( S2ow resul#()
,i) 3.#.2.3.@. Co''-bor-tio $i-)r-* e/-*p'e
MCA, MITS, 2012
33
FiVaTech Page-Level Web Data Extraction from Template Pages
Dep'o.*et Di-)r-*:
This diaram describe that how the user connect the all activities.
,i) 3.#.2.3.A Dep'o.*et $i-)r-*
3.<. D-t-b-se T-b'es:
The data base will store the all information in the form of tables. The table
describes all information in the form of rows and columns. In this pro"ect the
followin tables are data base tables.
MCA, MITS, 2012
34
Re+'est the site
user
,$o-se the .les
-ith ynami" template
! to "a$t
/ay bill
Res'lts
.ive'eedbac&
Gi0e feeba"1
FiVaTech Page-Level Web Data Extraction from Template Pages
tbl_ Feedback
T-b'e 3.<.1. ,ee$ b-c3
tbl_ login
T-b'e 3.<.2.'o)i
tbl_ Prategor!
Tsb'e 3.<.3.Pro$%ct C-te)or.
MCA, MITS, 2012
35
FiVaTech Page-Level Web Data Extraction from Template Pages
tbl_ Product
T-b'e 3.<. #. Pro$%ct $et-i's
CHAPTER - #
I&P(E&ENTATION
#.1 INTRODUCTION
Implementation is the process of assurin that the information system is
operational O then allowin users ta&e its own and it;s for use and evaluation.
Implementation includes the followin activities.
Installin the system O ma&in its run on its intended hardware.
3rovidin user access to system.
Trainin the users on the new system.
Documentin the system for its usersO for those who will be responsible for
mailin it in the future.
,a&in arranements to support the users as the system is used.
Transferrin onoin responsibilities to the operation.
1valuatin the operation O use of the system implementation process is the
system conversion.
The four basic conversion strateies include.
MCA, MITS, 2012
36
FiVaTech Page-Level Web Data Extraction from Template Pages
Direct Coversio:
In direct conversion the orani*ation slips usin the old system O the new one
at the same time.
P-r-''e' Coversio:
3arallel conversion involves runnin both old system and new system and
comparin their results .The new system is accepted only after the results have
matched for an acceptable prior.
P'ot Coversio:
3lot conversion means introducin the new system to a small part of
orani*ation# e%pandin its use once it is &nown to operatin properly there.
1ventually it will be use by entire orani*ation.
P"-se$ Coversio:
3hased conversion means introducin a system in stae# one component or
one module at a time# waitin until that one operatin properly before introducin
ne%t.
#.2. Overvie0 o! I*p'e*et-tio (-)%-)e
#.2.1 Itro$%ctio To .Net ,r-*e0or3:
The &icroso!t .NET ,r-*e0or3 is a software technoloy that is available
with several ,icrosoft )indows operatin systems. It includes a lare library of pre$
coded solutions to common prorammin problems and a virtual machine that
manaes the e%ecution of prorams written specifically for the framewor&. The .<1T
'ramewor& is a &ey ,icrosoft offerin and is intended to be used by most new
applications created for the )indows platform.
The pre$coded solutions that form the framewor&Ks =ase Class Library cover a
lare rane of prorammin needs in a number of areas# includin user interface# data
access# database connectivity# cryptoraphy# web application development# numeric
alorithms# and networ& communications. The class library is used by prorammers#
who combine it with their own code to produce applications.
3rorams written for the .<1T 'ramewor& e%ecute in a software environment
that manaes the proramKs runtime requirements. !lso part of the .<1T 'ramewor&#
MCA, MITS, 2012
37
FiVaTech Page-Level Web Data Extraction from Template Pages
this runtime environment is &nown as the Common Lanuae 4untime (CL4). The
CL4 provides the appearance of an application virtual machine so that prorammers
need not consider the capabilities of the specific C38 that will e%ecute the proram.
The CL4 also provides other important services such as security# memory
manaement# and e%ception handlin. The class library and the CL4 toether
compose the .<1T 'ramewor&.
#.2.1.2. Pricip-' $esi) !e-t%res
Iteroper-bi'it.
=ecause interaction between new and older applications is commonly
required# the .<1T 'ramewor& provides means to access functionality that is
implemented in prorams that e%ecute outside the .<1T environment. !ccess to C+,
components is provided in the System.4untime.InteropServices and
System.1nterpriseServices namespaces of the framewor&N access to other
functionality is provided usin the 3/Invo&e feature.
Co**o R%ti*e E)ie
The Common Lanuae 4untime (CL4) is the virtual machine component of
the .<1T framewor&. !ll .<1T prorams e%ecute under the supervision of the CL4#
uaranteein certain properties and behaviors in the areas of memory manaement#
security# and e%ception handlin.
:-se C'-ss (ibr-r.
The =ase Class Library (=CL)# part of the 'ramewor& Class Library ('CL)# is
a library of functionality available to all lanuaes usin the .<1T 'ramewor&. The
=CL provides classes which encapsulate a number of common functions# includin
file readin and writin# raphic renderin# database interaction and 2,L document
manipulation.
MCA, MITS, 2012
38
FiVaTech Page-Level Web Data Extraction from Template Pages
Si*p'i!ie$ Dep'o.*et
Installation of computer software must be carefully manaed to ensure that it
does not interfere with previously installed software# and that it conforms to security
requirements. The .<1T framewor& includes desin features and tools that help
address these requirements.
Sec%rit.
The desin is meant to address some of the vulnerabilities# such as buffer
overflows# that have been e%ploited by malicious software. !dditionally# .<1T
provides a common security model for all applications.
Port-bi'it.
The desin of the .<1T 'ramewor& allows it to theoretically be platform
anostic# and thus cross$platform compatible. That is# a proram written to use the
framewor& should run without chane on any type of system for which the framewor&
is implemented. ,icrosoftKs commercial implementations of the framewor& cover
)indows# )indows C1# and the 2bo% 7@>.

In addition# ,icrosoft submits the
specifications for the Common Lanuae Infrastructure (which includes the core class
libraries# Common Type System# and the Common Intermediate Lanuae)# the CP
lanuae# and the CQQ/CLI lanuae to both 1C,! and the IS+# ma&in them
available as open standards. This ma&es it possible for third parties to create
compatible implementations of the framewor& and its lanuaes on other platforms.
MCA, MITS, 2012
39
FiVaTech Page-Level Web Data Extraction from Template Pages
Architecture
,i) #.2.1.2.1. >is%-' overvie0 o! t"e Co**o (-)%-)e
I!r-str%ct%re
Co**o (-)%-)e I!r-str%ct%re
The core aspects of the .NET !r-*e0or3 lie within the Common Lanuae
Infrastructure# or C(I. The purpose of the CLI is to provide a lanuae$neutral
platform for application development and e%ecution# includin functions for e%ception
handlin# arbae collection# security# and interoperability. ,icrosoftKs
implementation of the CLI is called the Co**o (-)%-)e R%ti*e or C(R.
Asse*b'ies
The intermediate CIL code is housed in .<1T assemblies. !s mandated by
specification# assemblies are stored in the 3ortable 1%ecutable (31) format# common
on the )indows platform for all DLL and 121 files. The assembly consists of one or
MCA, MITS, 2012
40
FiVaTech Page-Level Web Data Extraction from Template Pages
more files# one of which must contain the manifest# which has the metadata for the
assembly. The complete name of an assembly (not to be confused with the filename
on dis&) contains its simple te%t name# version number# culture# and public &ey to&en.
The public &ey to&en is a unique hash enerated when the assembly is compiled# thus
two assemblies with the same public &ey to&en are uaranteed to be identical from the
point of view of the framewor&. ! private &ey can also be specified &nown only to the
creator of the assembly and can be used for stron namin and to uarantee that the
assembly is from the same author when a new version of the assembly is compiled
(required addin an assembly to the .lobal !ssembly Cache).
&et-$-t-
!ll CLI is self$describin throuh .<1T metadata. The CL4 chec&s the
metadata to ensure that the correct method is called. ,etadata is usually enerated by
lanuae compilers but developers can create their own metadata throuh custom
attributes. ,etadata contains information about the assembly# and is also used to
implement the reflective prorammin capabilities of .<1T 'ramewor&.
Sec%rit.
.<1T has its own security mechanism with two eneral features0 Code !ccess
Security (C!S)# and validation and verification. Code !ccess Security is based on
evidence that is associated with a specific assembly. Typically the evidence is the
source of the assembly (whether it is installed on the local machine or has been
downloaded from the intranet or Internet). Code !ccess Security uses evidence to
determine the permissions ranted to the code. +ther code can demand that callin
code is ranted a specified permission. The demand causes the CL4 to perform a call
stac& wal&0 every assembly of each method in the call stac& is chec&ed for the
required permissionN if any assembly is not ranted the permission a security
e%ception is thrown.
)hen an assembly is loaded the CL4 performs various tests. Two such tests
are validation and verification. Durin validation the CL4 chec&s that the assembly
contains valid metadata and CIL# and whether the internal tables are correct.
(erification is not so e%act. The verification mechanism chec&s to see if the code does
anythin that is KunsafeK. The alorithm used is quite conservativeN hence occasionally
MCA, MITS, 2012
41
FiVaTech Page-Level Web Data Extraction from Template Pages
code that is KsafeK does not pass. 8nsafe code will only be e%ecuted if the assembly has
the Ks&ip verificationK permission# which enerally means code that is installed on the
local machine.
.<1T 'ramewor& uses appdomains as a mechanism for isolatin code runnin
in a process. !ppdomains can be created and code loaded into or unloaded from them
independent of other appdomains. This helps increase the fault tolerance of the
application# as faults or crashes in one appdomain do not affect rest of the application.
!ppdomains can also be confiured independently with different security privilees.
C'-ss 'ibr-r.
N-*esp-ces i t"e :C(
System
System. CodeDom
System. Collections
System. Dianostics
System. .lobali*ation
System. I+
System. 4esources
System. Te%t
System.Te%t.4eular1%pressions
T-b'e #.2.1.2.1 :-sic C'-ss (ibr-ries
,icrosoft .NET ,r-*e0or3 includes a set of standard c'-ss 'ibr-ries. The
class library is orani*ed in a hierarchy of namespaces. ,ost of the built in !3Is are
part of either System.R or ,icrosoft.R namespaces. It encapsulates a lare number of
common functions# such as file readin and writin# raphic renderin# database
interaction# and 2,L document manipulation# amon others. The .<1T class libraries
are available to all .<1T lanuaes. The .<1T 'ramewor& class library is divided into
two parts0 the :-se C'-ss (ibr-r. and the ,r-*e0or3 C'-ss (ibr-r..
The :-se C'-ss (ibr-r. (=CL) includes a small subset of the entire class
library and is the core set of classes that serve as the basic !3I of the Common
Lanuae 4untime.

The classes in mscorlib.dll and some of the classes in System.dll
and System.core.dll are considered to be a part of the =CL. The =CL classes are
MCA, MITS, 2012
42
FiVaTech Page-Level Web Data Extraction from Template Pages
available in both .<1T 'ramewor& as well as its alternative implementations
includin .<1T Compact 'ramewor&# ,icrosoft Silverliht and ,ono.
The ,r-*e0or3 C'-ss (ibr-r. ('CL) is a superset of the =CL classes and
refers to the entire class library that ships with .<1T 'ramewor&. It includes an
e%panded set of libraries# includin )in'orms# !D+.<1T# !S3.<1T# Lanuae
Interated Cuery# )indows 3resentation 'oundation# )indows Communication
'oundation amon others. The 'CL is much larer in scope than standard libraries for
lanuaes li&e CQQ# and comparable in scope to the standard libraries of Sava.
&e*or. *--)e*et
The .<1T 'ramewor& CL4 frees the developer from the burden of manain
memory (allocatin and freein up when done)N instead it does the memory
manaement itself. To this end# the memory allocated to instantiations of .<1T types
(ob"ects) is done contiuously from the manaed heap# a pool of memory manaed by
the CL4. !s lon as there e%ists a reference to an ob"ect# which miht be either a
direct reference to an ob"ect or via a raph of ob"ects# the ob"ect is considered to be in
use by the CL4. )hen there is no reference to an ob"ect# and it cannot be reached or
used# it becomes arbae. -owever# it still holds on to the memory allocated to it.
.<1T 'ramewor& includes a arbae collector which runs periodically# on a separate
thread from the applicationKs thread# that enumerates all the unusable ob"ects and
reclaims the memory allocated to them.
The .C used by .<1T 'ramewor& is actually generational.

+b"ects are
assined a generationN newly created ob"ects belon to Generation 0. The ob"ects that
survive a arbae collection are taed as Generation 1# and the .eneration 5 ob"ects
that survive another collection are Generation 2 ob"ects. The .<1T 'ramewor& uses
up to .eneration 6 ob"ects. -iher eneration ob"ects are arbae collected less
frequently than lower eneration ob"ects. This helps increase the efficiency of arbae
collection# as older ob"ects tend to have a larer lifetime than newer ob"ects. Thus# by
removin older (and thus more li&ely to survive a collection) ob"ects from the scope
of a collection run# fewer ob"ects need to be chec&ed and compacted.
>ersios
MCA, MITS, 2012
43
FiVaTech Page-Level Web Data Extraction from Template Pages
,icrosoft started development on the .<1T 'ramewor& in the late 5II>s
oriinally under the name of <e%t .eneration )indows Services (<.)S). =y late
6>>> the first beta versions of .<1T 5.> were released.
,i) #.2.1.2.2 . .NET Arc"itect%re
>ersio >ersio N%*ber Re'e-se D-te
5.> 5.>.7G>F.> 6>>6$>5$>F
5.5 5.5.A766.FG7 6>>7$>A$>5
6.> 6.>.F>G6G.A6 6>>F$55$>G
7.> 7.>.AF>@.7> 6>>@$55$>@
7.F 7.F.65>66.? 6>>G$55$>I
A.> A.>.7>75I.5 6>5>$>A$56
MCA, MITS, 2012
44
FiVaTech Page-Level Web Data Extraction from Template Pages
T-b'e #.2.1.2.2. T"e .NET ,r-*e0or3 st-c3
#.2.1.3. C'iet App'ic-tio Deve'op*et
Client applications are the closest to a traditional style of application in
)indows$based prorammin. These are the types of applications that display
windows or forms on the des&top# enablin a user to perform a tas&. Client
applications include applications such as word processors and spreadsheets# as well as
custom business applications such as data$entry tools# reportin tools# and so on.
Client applications usually employ windows# menus# buttons# and other .8I
elements# and they li&ely access local resources such as the file system and
peripherals such as printers. !nother &ind of client application is the traditional
!ctive2 control (now replaced by the manaed )indows 'orms control) deployed
over the Internet as a )eb pae. This application is much li&e other client
applications0 it is e%ecuted natively# has access to local resources# and includes
raphical elements.
In the past# developers created such applications usin C/CQQ in con"unction
with the ,icrosoft 'oundation Classes (,'C) or with a rapid application
development (4!D) environment such as ,icrosoftT (isual =asicT. The .<1T
'ramewor& incorporates aspects of these e%istin products into a sinle# consistent
development environment that drastically simplifies the development of client
applications.
The )indows 'orms classes contained in the .<1T 'ramewor& are desined
to be used for .8I development. Jou can easily create command windows# buttons#
menus# toolbars# and other screen elements with the fle%ibility necessary to
accommodate shiftin business needs.
'or e%ample# the .<1T 'ramewor& provides simple properties to ad"ust visual
attributes associated with forms. In some cases the underlyin operatin system does
not support chanin these attributes directly# and in these cases the .<1T 'ramewor&
automatically recreates the forms. This is one of many ways in which the .<1T
MCA, MITS, 2012
45
FiVaTech Page-Level Web Data Extraction from Template Pages
'ramewor& interates the developer interface# ma&in codin simpler and more
consistent.
#.2.2. ASP.NET
#.2.2.1. Server App'ic-tio Deve'op*et
Server$side applications in the manaed world are implemented throuh
runtime hosts. 8nmanaed applications host the common lanuae runtime# which
allows your custom manaed code to control the behavior of the server. This model
provides you with all the features of the common lanuae runtime and class library
while ainin the performance and scalability of the host server.
The followin illustration shows a basic networ& schema with manaed code
runnin in different server environments. Servers such as IIS and SCL Server can
perform standard operations while your application loic e%ecutes throuh the
manaed code.
#.2.2.2. Server-Si$e &--)e$ Co$e
!S3.<1T is the hostin environment that enables developers to use the .<1T
'ramewor& to taret )eb$based applications. -owever# !S3.<1T is more than "ust a
runtime hostN it is a complete architecture for developin )eb sites and Internet$
distributed ob"ects usin manaed code. =oth )eb 'orms and 2,L )eb services use
IIS and !S3.<1T as the publishin mechanism for applications# and both have a
collection of supportin classes in the .<1T 'ramewor&.
2,L )eb services# an important evolution in )eb$based technoloy# are
distributed# server$side application components similar to common )eb sites.
-owever# unli&e )eb$based applications# 2,L )eb services components have no 8I
and are not tareted for browsers such as Internet 1%plorer and <etscape <aviator.
Instead# 2,L )eb services consist of reusable software components desined to be
consumed by other applications# such as traditional client applications# )eb$based
applications# or even other 2,L )eb services. !s a result# 2,L )eb services
MCA, MITS, 2012
46
FiVaTech Page-Level Web Data Extraction from Template Pages
technoloy is rapidly movin application development and deployment into the hihly
distributed environment of the Internet.
If you have used earlier versions of !S3 technoloy# you will immediately
notice the improvements that !S3.<1T and )eb 'orms offers. 'or e%ample# you can
develop )eb 'orms paes in any lanuae that supports the .<1T 'ramewor&. In
addition# your code no loner needs to share the same file with your -TT3 te%t
(althouh it can continue to do so if you prefer). )eb 'orms paes e%ecute in native
machine lanuae because# li&e any other manaed application# they ta&e full
advantae of the runtime. In contrast# unmanaed !S3 paes are always scripted and
interpreted. !S3.<1T paes are faster# more functional# and easier to develop than
unmanaed !S3 paes because they interact with the runtime li&e any manaed
application.
The .<1T 'ramewor& also provides a collection of classes and tools to aid in
development and consumption of 2,L )eb services applications. 2,L )eb services
are built on standards such as S+!3 (a remote procedure$call protocol)# 2,L (an
e%tensible data format)# and )SDL ( the )eb Services Description Lanuae).
The .<1T 'ramewor& is built on these standards to promote interoperability with non$
,icrosoft solutions.
'or e%ample# the )eb Services Description Lanuae tool included with
the .<1T 'ramewor& SDU can query an 2,L )eb service published on the )eb#
parse its )SDL description# and produce CP or (isual =asic source code that your
application can use to become a client of the 2,L )eb service. The source code can
create classes derived from classes in the class library that handle all the underlyin
communication usin S+!3 and 2,L parsin. !lthouh you can use the class library
to consume 2,L )eb services directly# the )eb Services Description Lanuae tool
and the other tools contained in the SDU facilitate your development efforts with
the .<1T 'ramewor&.
If you develop and publish your own 2,L )eb service# the .<1T 'ramewor&
provides a set of classes that conform to all the underlyin communication standards#
such as S+!3# )SDL# and 2,L. 8sin those classes enables you to focus on the
loic of your service# without concernin yourself with the communications
infrastructure required by distributed software development.
MCA, MITS, 2012
47
FiVaTech Page-Level Web Data Extraction from Template Pages
'inally# li&e )eb 'orms paes in the manaed environment# your 2,L )eb
service will run with the speed of native machine lanuae usin the scalable
communication of IIS.
#.2.2.3. Active Server P-)es.NET
!S3.<1T is a prorammin framewor& built on the common lanuae
runtime that can be used on a server to build powerful )eb applications. !S3.<1T
offers several important advantaes over previous )eb development models0
E"-ce$ Per!or*-ce. !S3.<1T is compiled common lanuae runtime
code runnin on the server. 8nli&e its interpreted predecessors# !S3.<1T can
ta&e advantae of early bindin# "ust$in$time compilation# native optimi*ation#
and cachin services riht out of the bo%. This amounts to dramatically better
performance before you ever write a line of code.
5or'$-C'-ss Too' S%pport. The !S3.<1T framewor& is complemented by a
rich toolbo% and desiner in the (isual Studio interated development
environment. )JSI)J. editin# dra$and$drop server controls# and
automatic deployment are "ust a few of the features this powerful tool
provides.
Po0er -$ ,'e/ibi'it.. =ecause !S3.<1T is based on the common lanuae
runtime# the power and fle%ibility of that entire platform is available to )eb
application developers. The .<1T 'ramewor& class library# ,essain# and
Data !ccess solutions are all seamlessly accessible from the )eb. !S3.<1T is
also lanuae$independent# so you can choose the lanuae that best applies to
your application or partition your application across many lanuaes. 'urther#
common lanuae runtime interoperability uarantees that your e%istin
investment in C+,$based development is preserved when miratin to
!S3.<1T.
Si*p'icit.. !S3.<1T ma&es it easy to perform common tas&s# from simple
form submission and client authentication to deployment and site
confiuration. 'or e%ample# the !S3.<1T pae framewor& allows you to build
user interfaces that cleanly separate application loic from presentation code
and to handle events in a simple# (isual =asic $ li&e forms processin model.
MCA, MITS, 2012
48
FiVaTech Page-Level Web Data Extraction from Template Pages
!dditionally# the common lanuae runtime simplifies development# with
manaed code services such as automatic reference countin and arbae
collection.
&--)e-bi'it.. !S3.<1T employs a te%t$based# hierarchical confiuration
system# which simplifies applyin settins to your server environment and
)eb applications. =ecause confiuration information is stored as plain te%t#
new settins may be applied without the aid of local administration tools. This
H*ero local administrationH philosophy e%tends to deployin !S3.<1T
'ramewor& applications as well. !n !S3.<1T 'ramewor& application is
deployed to a server simply by copyin the necessary files to the server. <o
server restart is required# even to deploy or replace runnin compiled code.
Sc-'-bi'it. -$ Av-i'-bi'it.. !S3.<1T has been desined with scalability in
mind# with features specifically tailored to improve performance in clustered
and multiprocessor environments. 'urther# processes are closely monitored
and manaed by the !S3.<1T runtime# so that if one misbehaves (lea&s#
deadloc&s)# a new process can be created in its place# which helps &eep your
application constantly available to handle requests.
C%sto*i;-bi'it. -$ E/tesibi'it.. !S3.<1T delivers a well$factored
architecture that allows developers to Hplu$inH their code at the appropriate
level. In fact# it is possible to e%tend or replace any subcomponent of the
!S3.<1T runtime with your own custom$written component. Implementin
custom authentication or state services has never been easier.
Sec%rit.. )ith built in )indows authentication and per$application
confiuration# you can be assured that your applications are secure.
5HAT IS ASP.NET 5E: ,OR&SB
The !S3.<1T )eb 'orms pae framewor& is a scalable common lanuae
runtime prorammin model that can be used on the server to dynamically enerate
)eb paes.
Intended as a loical evolution of !S3 (!S3.<1T provides synta%
compatibility with e%istin paes)# the !S3.<1T )eb 'orms framewor& has been
MCA, MITS, 2012
49
FiVaTech Page-Level Web Data Extraction from Template Pages
specifically desined to address a number of &ey deficiencies in the previous model.
In particular# it provides0
The ability to create and use reusable 8I controls that can encapsulate
common functionality and thus reduce the amount of code that a pae
developer has to write.
The ability for developers to cleanly structure their pae loic in an orderly
fashion (not Hspahetti codeH).
The ability for development tools to provide stron )JSI)J. desin
support for paes (e%istin !S3 code is opaque to tools).
!S3.<1T )eb 'orms paes are te%t files with an .asp% file name e%tension.
They can be deployed throuhout an IIS virtual root directory tree. )hen a browser
client requests .asp% resources# the !S3.<1T runtime parses and compiles the taret
file into a .<1T 'ramewor& class. This class can then be used to dynamically process
incomin requests. (<ote that the .asp% file is compiled only the first time it is
accessedN the compiled type instance is then reused across multiple requests).
!n !S3.<1T pae can be created simply by ta&in an e%istin -T,L file and
chanin its file name e%tension to .asp% (no modification of code is required). 'or
e%ample# the followin sample demonstrates a simple -T,L pae that collects a
userKs name and cateory preference and then performs a form postbac& to the
oriinatin pae when a button is clic&ed0
!S3.<1T provides synta% compatibility with e%istin !S3 paes. This
includes support for LV VM code render bloc&s that can be intermi%ed with -T,L
content within an .asp% file. These code bloc&s e%ecute in a top$down manner at pae
render time.
CODE-:EHIND 5E: ,OR&S
!S3.<1T supports two methods of authorin dynamic paes. The first is the
method shown in the precedin samples# where the pae code is physically declared
within the oriinatin .asp% file. !n alternative approach$$&nown as the code$behind
method$$enables the pae code to be more cleanly separated from the -T,L content
into an entirely separate file.
MCA, MITS, 2012
50
FiVaTech Page-Level Web Data Extraction from Template Pages
#.2.2.#. INTRODUCTION TO ASP.NET SER>ER CONTRO(S
In addition to (or instead of) usin LV VM code bloc&s to proram dynamic
content# !S3.<1T pae developers can use !S3.<1T server controls to proram )eb
paes. Server controls are declared within an .asp% file usin custom tas or intrinsic
-T,L tas that contain a r%-tCDserverD attributes value. Intrinsic -T,L tas are
handled by one of the controls in the S.ste*.5eb.UI.Ht*'Cotro's namespace. !ny
ta that doesnKt e%plicitly map to one of the controls is assined the type of
S.ste*.5eb.UI.Ht*'Cotro's.Ht*'2eericCotro'.
Server controls automatically maintain any client$entered values between
round trips to the server. This control state is not stored on the server (it is instead
stored within an Eip%t t.peCD"i$$eDF form field that is round$tripped between
requests). <ote also that no client$side script is required.
In addition to supportin standard -T,L input controls# !S3.<1T enables
developers to utili*e richer custom controls on their paes. 'or e%ample# the followin
sample demonstrates how the E-sp:-$rot-torF control can be used to dynamically
display rotatin ads on a pae.
!S3.<1T )eb 'orms provide an easy and powerful way to build dynamic
)eb 8I.
!S3.<1T )eb 'orms paes can taret any browser client (there are no script
library or coo&ie requirements).
!S3.<1T )eb 'orms paes provide synta% compatibility with e%istin !S3
paes.
!S3.<1T server controls provide an easy way to encapsulate common
functionality.
!S3.<1T ships with AF built$in server controls. Developers can also use
controls built by third parties.
!S3.<1T server controls can automatically pro"ect both up level and down$
level -T,L.
MCA, MITS, 2012
51
FiVaTech Page-Level Web Data Extraction from Template Pages
!S3.<1T templates provide an easy way to customi*e the loo& and feel of list
server controls.
!S3.<1T validation controls provide an easy way to do declarative client or
server data validation.
#.2.3. Itro$%ctio to ADO.NET
!D+.<1T is an ob"ect$oriented set of libraries that allows you to interact with
data sources. Commonly# the data source is a database# but it could also be a te%t file#
an 1%cel spreadsheet# or an 2,L file. 'or the purposes of this tutorial# we will loo&
at !D+.<1T as a way to interact with a data base.
D-t- Provi$ers
)e &now that !D+.<1T allows us to interact with different types of data
sources and different types of databases. -owever# there isnKt a sinle set of classes
that allow you to accomplish this universally. Since different data sources e%pose
different protocols# we need a way to communicate with the riht data source usin
the riht protocol. Some older data sources use the +D=C protocol# many newer data
sources use the +leDb protocol# and there are more data sources every day that allow
you to communicate with them directly throuh .<1T !D+.<1T class libraries.
!D+.<1T provides a relatively common way to interact with data sources#
but comes in different sets of libraries for each way you can tal& to a data source.
These libraries are called Data 3roviders and are usually named for the protocol or
data source type they allow you to interact with. table 5 lists some well$&nown data
providers# the !3I prefi% they use# and the type of data source they allow you to
interact with.
Provi$er N-*e API pre!i/ D-t- So%rce Descriptio
+D=C Data 3rovider +dbc
Data Sources with an +D=C interface.
<ormally older data bases.
+leDb Data 3rovider +leDb
Data Sources that e%pose an +leDb interface#
i.e. !ccess or 1%cel.
MCA, MITS, 2012
52
FiVaTech Page-Level Web Data Extraction from Template Pages
+racle Data 3rovider +racle 'or +racle Databases.
SCL Data 3rovider Sql 'or interactin with ,icrosoft SCL Server.
=orland Data
3rovider
=dp
.eneric access to many databases such as
Interbase# SCL Server# I=, D=6# and +racle.
T-b'e #.2.3..1. ADO.NET D-t- Provi$ers -re c'-ss 'ibr-ries t"-t -''o0 -
co**o 0-. to iter-ct 0it" speci!ic $-t- so%rces or protoco's. T"e 'ibr-r.
APIs "-ve pre!i/es t"-t i$ic-te 0"ic" provi$er t"e. s%pport.
!n e%ample may help you to understand the meanin of the !3I prefi%. +ne
of the first !D+.<1T ob"ects youKll learn about is the connection ob"ect# which allows
you to establish a connection to a data source. If we were usin the +leDb Data
3rovider to connect to a data source that e%poses an +leDb interface# we would use a
connection ob"ect named +leDbConnection. Similarly# the connection ob"ect name
would be prefi%ed with +dbc or Sql for an +dbcConnection ob"ect on an +dbc data
source or a SqlConnection ob"ect on a SCL Server database# respectively. Since we
are usin ,SD1 in this tutorial (a scaled down version of SCL Server) all the !3I
ob"ects will have the Sql prefi%. i.e. SqlConnection.
ADO.NET Objects
!D+.<1T includes many ob"ects you can use to wor& with data. This section
introduces some of the primary ob"ects you will use. +ver the course of this tutorial#
youKll be e%posed to many more !D+.<1T ob"ects from the perspective of how they
are used in a particular lesson. The ob"ects below are the ones you must &now.
Learnin about them will ive you an idea of the types of thins you can do with data
when usin !D+.<1T.
T"e S1' Coectio Object
To interact with a database# you must have a connection to it. The connection
helps identify the database server# the database name# user name# password# and other
parameters that are required for connectin to the data base. ! connection ob"ect is
used by command ob"ects so they will &now which database to e%ecute the command
on.
MCA, MITS, 2012
53
FiVaTech Page-Level Web Data Extraction from Template Pages
T"e S1' Co**-$ Object
The process of interactin with a database means that you must specify the
actions you want to occur. This is done with a command ob"ect. Jou use a command
ob"ect to send SCL statements to the database. ! command ob"ect uses a connection
ob"ect to fiure out which database to communicate with. Jou can use a command
ob"ect alone# to e%ecute a command directly# or assin a reference to a command
ob"ect to an SqlData!dapter# which holds a set of commands that wor& on a roup of
data as described below.
T"e S1' D-t-Re-$er Object
,any data operations require that you only et a stream of data for readin.
The data reader ob"ect allows you to obtain the results of a S1L1CT statement from a
command ob"ect. 'or performance reasons# the data returned from a data reader is a
fast forward$only stream of data. This means that you can only pull the data from the
stream in a sequential manner. This is ood for speed# but if you need to manipulate
data# then a DataSet is a better ob"ect to wor& with.
T"e D-t-Set Object
DataSet ob"ects are in$memory representations of data. They contain multiple
Datatable ob"ects# which contain columns and rows# "ust li&e normal database tables.
Jou can even define relations between tables to create parent$child relationships. The
DataSet is specifically desined to help manae data in memory and to support
disconnected operations on data# when such a scenario ma&e sense. The DataSet is an
ob"ect that is used by all of the Data 3roviders# which is why it does not have a Data
3rovider specific prefi%.
T"e S1'D-t-A$-pter Object
Sometimes the data you wor& with is primarily read$only and you rarely need
to ma&e chanes to the underlyin data source. Some situations also call for cachin
data in memory to minimi*e the number of database calls for data that does not
chane. The data adapter ma&es it easy for you to accomplish these thins by helpin
MCA, MITS, 2012
54
FiVaTech Page-Level Web Data Extraction from Template Pages
to manae data in a disconnected mode. The data adapter fills a DataSet ob"ect when
readin the data and writes in a sinle batch when persistin chanes bac& to the
database. ! data adapter contains a reference to the connection ob"ect and opens and
closes the connection automatically when readin from or writin to the database.
!dditionally# the data adapter contains command ob"ect references for S1L1CT#
I<S14T# 83D!T1# and D1L1T1 operations on the data. Jou will have a data
adapter defined for each table in a DataSet and it will ta&e care of all communication
with the database for you. !ll you need to do is tell the data adapter when to load
from or write to the database.
#.2.#. CG .NET
CP# pronounced c sharp# is a computer lanuae used to ive instructions that
tell the computer what to do# how to do it# and when to do it. This is a universal
lanuae that is used on many operatin systems# includin ,icrosoft )indows. CP is
one of the lanuaes used in the ,icrosoft .<1T 'ramewor&. The ,icrosoft .<1T
'ramewor& is a library of ob"ects that create or draw thins on the computer.
The prorams we will write are meant to ive instructions to the computer
about what to do# when to do somethin# and how to do it. Jou write these
instructions in an easy to understand 1nlish format# usin words we will study. This
means that a reular instruction uses normal te%t with alphabetic characters# numbers#
and non$readable symbols. <ormally# you can write your instructions usin any te%t
editor such as <otepad# )ord3ad# )ord3erfect# or ,icrosoft )ord# etc. )hen writin
your instructions# there are rules your must follow and suestions you should
observe..
The roup of instructions used by your proram is also referred to as code. To
assist you with writin code# ,icrosoft (isual CP 6>>? includes a te%t editor referred
to as the Code 1ditor. This is the window that displays when you have "ust created a
console application. =esides the Code 1ditor# the interated development interface
(ID1) of the ,icrosoft (isual CP 6>>? is made of various parts# which we will review
when necessary.
#.3. Overvie0 o! I*p'e*et-tio D-t- :-se
#.3.1 SH( SER>ER
MCA, MITS, 2012
55
FiVaTech Page-Level Web Data Extraction from Template Pages
! database manaement# or D=,S# ives the user access to their data and
helps them transform the data into information. Such database manaement systems
include d=ase# parado%# I,S# SCL Server and SCL Server. These systems allow
users to create# update and e%tract information from their database.
! database is a structured collection of data. Data refers to the characteristics
of people# thins and events. SCL Server stores each data item in its own fields. In
SCL Server# the fields relatin to a particular person# thin or event are bundled
toether to form a sinle complete unit of data# called a record (it can also be referred
to as raw or an occurrence). 1ach record is made up of a number of fields. <o two
fields in a record can have the same field name. Durin an SCL Server Database
desin pro"ect# the analysis of your business needs identifies all the fields or attributes
of interest. If your business needs chane over time# you define any additional fields
or chane the definition of e%istin fields.
S1' Server T-b'es
SCL Server stores records relatin to each other in a table. Different tables are
created for the various roups of information. 4elated tables are rouped toether to
form a database.
Pri*-r. Ie.
1very table in SCL Server has a field or a combination of fields that uniquely
identifies each record in the table. The 8nique identifier is called the 3rimary Uey# or
simply the Uey. The primary &ey provides the means to distinuish one record from
all other in a table. It allows the user and the database system to identify# locate and
refer to one particular record in the database.
Re'-tio-' D-t-b-se
Sometimes all the information of interest to a business operation can be stored
in one table. SCL Server ma&es it very easy to lin& the data in multiple tables.
,atchin an employee to the department in which they wor& is one e%ample. This is
what ma&es SCL Server a relational database manaement system# or 4D=,S. It
stores data in two or more tables and enables you to define relationships between the
table and enables you to define relationships between the tables.
,orei) Ie.
MCA, MITS, 2012
56
FiVaTech Page-Level Web Data Extraction from Template Pages
)hen a field is one table matches the primary &ey of another field is referred
to as a forein &ey. ! forein &ey is a field or a roup of fields in one table whose
values match those of the primary &ey of another table.
Re!ereti-' Ite)rit.
<ot only does SCL Server allow you to lin& multiple tables# it also maintains
consistency between them. 1nsurin that the data amon related tables is correctly
matched is referred to as maintainin referential interity.
D-t- Abstr-ctio
! ma"or purpose of a database system is to provide users with an abstract view
of the data. This system hides certain details of how the data is stored and
maintained. Data abstraction is divided into three levels.
P".sic-' 'eve': This is the lowest level of abstraction at which one describes how the
data are actually stored.
Cocept%-' 'eve': !t this level of database abstraction all the attributed and what
data are actually stored is described and entries and relationship amon them. >ie0
'eve': This is the hihest level of abstraction at which one describes only part of the
database.
CHAPTER - <
TESTIN2
<.1 Itro$%ctio
So!t0-re Testi):
Software testin is a critical element of software quality assurance and
represents the ultimate reuse of specification. Desin and code testin represents
interestin anomaly for the software durin earlier definition and development phase#
it was attempted to build software from an abstract concept to tanible
implementation.
The testin phase involves# testin of the development of the system usin
various techniques such as )hite =o% Testin# Control Structure Testin.
<.2 Testi) ,%$-*et-'s:
MCA, MITS, 2012
57
FiVaTech Page-Level Web Data Extraction from Template Pages
Testi) &et"o$o'o)ies:
=lac& bo% Testin0
)hite bo% Testin.
.ray =o% Testin.
<.2.1. :'-c3 bo/ Testi):
This testin method considers a module as a sinle unit and chec&s the unit at
interface and communication with other modules rather ettin into details at
statement level. -ere the module will be treated as a bloc& bo% that will ta&e some
input and enerate output. +utput for a iven set of input combinations are forwarded
to other modules.
:'-c3 :o/ Testi) i t"is Project0 I tested each and every module by considerin
each module as a unit. I have prepared some set of input combinations and chec&ed
the outputs for those inputs. !lso I tested whether the communication between one
module to other module is performin well or not.
<.2.2. 5"ite bo/ Testi).
)hite =o% Testin mainly focuses on the internal performance of the product.
-ere a part will be ta&en at a time and tested thorouhly at a statement level to find
the ma%imum possible errors. !lso construct a loop in such a way that the part will be
tested with in a rane. That means the part is e%ecute at its boundary values and
within bounds for the purpose of testin.
5"ite :o/ testi) i t"is Project0 I tested step wise every piece of code# ta&in care
that every statement in the code is e%ecuted at least once. I have enerated a list of test
cases# sample data# which is used to chec& all possible combinations of e%ecution
paths throuh the code at every module level.
<.2.3. 2r-. :o/ Testi).
.ray =o% Testin is the process in which the combination of blac& bo% and
white bo% tonics; are used.
<.3 Testi) Str-te).:
MCA, MITS, 2012
58
FiVaTech Page-Level Web Data Extraction from Template Pages
! stratey for software testin must accommodate low$level tests that are necessary to
verify that a small source code sement has been correctly implemented as well as
hih level aainst customer requirements.
(eve' o! testi):
In order to uncover the errors present in different phases we have the concept
of levels of testin. The basic levels of testin are as shown belowW
8nit Testin.
Interation Testin.
System Testin.
8ser !cceptance Testin.
Client <eeds
4equirements
Desin
Code
,i) <.3.1. 'eve's o! testi)
Uit Testi):
8nit testin focuses verification effort on the smallest unit of software i.e. the
module. 8sin the detailed desin and the process specifications testin is done to
MCA, MITS, 2012
59
!cceptance
Testin
Sys#em 3es#ing
4n#egra#ion 3es#ing
5ni# 3es#ing
FiVaTech Page-Level Web Data Extraction from Template Pages
uncover errors within the boundary of the module. !ll modules must be successful in
the unit test before the start of the interation testin beins.
Uit Testi) i t"is project: In this pro"ect each service can be thouht of a module.
There are so many modules li&e loin# addin products# chane
password# user view products etc. )hen developin the module as well
as finishin the development so that each module wor&s without any
error. The inputs are validated when acceptin from the user.
Ite)r-tio Testi)
!fter the unit testin we have to perform interation testin. The oal here is to see if
modules can be interated properly or not. This testin activity can be considered as
testin the desin and hence the emphasis on testin module interactions. It also
helps to uncover a set of errors associated with interfacin. -ere the input to these
modules will be the unit tested modules.
Interation testin is classifies in two typesW
Top$Down Interation Testin.
=ottom$8p Interation Testin.
In Top$Down Interation Testin modules are interated by movin downward
throuh the control hierarchy# beinnin with the main control module.
In =ottom$8p Interation Testin each sub module is tested separately and then the
full system is tested.
Ite)r-tio Testi) i t"is project:
In this pro"ect interatin all the modules forms the main system. ,eans I
used =ottom$8p Interation Testin for this pro"ect. )hen interatin all the
modules I have chec&ed whether the interation effects wor&in of any of the services
by ivin different combinations of inputs with which the two services run perfectly
before Interation.
S.ste* Testi)
MCA, MITS, 2012
60
FiVaTech Page-Level Web Data Extraction from Template Pages
3ro"ect testin is an important phase without which the system can;t be
released to the end users. It is aimed at ensurin that all the processes are accordin
to the specification accurately.
S.ste* Testi) i t"is project:
-ere entire :system; has been tested aainst requirements of pro"ect and it is chec&ed
whether all requirements of pro"ect have been satisfied or not.
Accept-ce Testi)
!cceptance Test is performed with realistic data of the client to demonstrate
that the software is wor&in satisfactorily. Testin here is focused on e%ternal
behavior of the systemN the internal loic of proram is not emphasi*ed.
Accept-ce Testi) i t"is project0
In this pro"ect I have collected some data that was belons to the shoppin site
and tested whether pro"ect is wor&in correctly or not.
Test P'-i):
5. Test plannin is strateic document.
6. This document involves the scope of testin#
7. +b"ective of testin#
A. !reas that need to be tested#
F. !reas that should not be tested#
@. Schedulin 4esource 3lannin#
G. !reas to be automated# various testin tools used
T.pes o! testi):
Smo&e Testin.
Sanitary Testin.
4eression Testin.
MCA, MITS, 2012
61
FiVaTech Page-Level Web Data Extraction from Template Pages
4e$Testin.
Static Testin.
Dynamic Testin.
!lpha$Testin.
=eta$Testin.
,on&ey Testin.
Compatibility Testin.
Installation Testin.
!dhoc Testin.
S*o3e Testi): is the process of initial testin in which tester loo&s for the
availability of all the functionality of the application in order to perform detailed
testin on them. (,ain chec& is for available forms)
S-it. Testi): is a type of testin that is conducted on an application initially to
chec& for the proper behavior of an application that is to chec& all the functionality
are available before the detailed testin is conducted by on them.
Re)ressio Testi): is one of the best and important testin. 4eression testin is
the process in which the functionality# which is already tested before# is once aain
tested whenever some new chane is added in order to chec& whether the e%istin
functionality remains same.
Re-Testi): is the process in which testin is performed on some functionality
which is already tested before to ma&e sure that the defects are reproducible and to
rule out the environments issues if at all any defects are there.
MCA, MITS, 2012
62
FiVaTech Page-Level Web Data Extraction from Template Pages
St-tic Testi): is the testin# which is performed on an application when it is not
been e%ecuted.e%0 .8I# Document Testin
D.-*ic Testi): is the testin which is performed on an application when it is
bein e%ecuted.e%0 'unctional testin.
A'p"- Testi): it is a type of user acceptance testin# which is conducted on an
application when it is "ust before released to the customer.
:et--Testi): it is a type of 8!T that is conducted on an application when it is
released to the customer# when deployed in to the real time environment and bein
accessed by the real time users.
&o3e. Testi): is the process in which abnormal operations# beyond capacity
operations are done on the application to chec& the stability of it in spite of the users
abnormal behavior.
Co*p-tibi'it. testi): it is the testin process in which usually the products are
tested on the environments with different combinations of databases (application
servers# browsersWetc) In order to chec& how far the product is compatible with all
these environments platform combination.
Ist-''-tio Testi): it is the process of testin in which the tester try to install or
try to deploy the module into the correspondin environment by followin the
uidelines produced in the deployment document and chec& whether the installation is
successful or not.
A$"oc Testi): !dhoc Testin is the process of testin in which unli&e the formal
testin where in test case document is used# without that test case document testin
can be done of an application# to cover that testin of the future which are not covered
in that test case document. !lso it is intended to perform .8I testin which may
involve the cosmetic issues.
5.4 S-*p'e Test C-ses
Test C-se 1 J (o)i
MCA, MITS, 2012
63
FiVaTech Page-Level Web Data Extraction from Template Pages
Test 1:
Incorrect input0 !n empty requirement field. (user name and password)
3ass criteria0 !n appropriate error messae should be displayed and the user
shouldn;t be allowed to loin.
Correct input0 4iht user name and password.
3ass criteria0 The user should be directed to the secure web pae which the
user is requested.
Test 2:
Incorrect input0 wron user name and/or wron password.
3ass criteria0 the user shouldn;t be allowed to loin to the system and an
appropriate error messae should be displayed.
Correct input0 riht user name and password.
3ass criteria0 the user should be loin to the system and directin to the
requested secure web pae.
Test C-se 2 J C"-)e P-ss0or$
Incorrect input0 !n empty requirement field.
3ass criteria0 !n appropriate error messae should be displayed and the user
shouldn;t be allowed to create password.
Correct input0 'ill in all requirement fields in correct format.
3ass criteria0 The user information should be added into the database.
Test C-se 3 J A$$ Pro$%cts
Test 1:
Incorrect input0 !n empty requirement field# numeric letters.
3ass criteria0 !n appropriate error messae should be displayed and the admin
should not be able to enerate a report.
Correct input0 1nter te%t as name of product <ame.
3ass criteria0 The !dmin should be allowed to enerate output.
Test2:
Incorrect input0 !n empty requirement field# te%t as strin.
3ass criteria0 !n appropriate error messae should be displayed
!nd the admin should not be able to write price.
Correct input0 1nter numeric as price for products
3ass criteria0 The !dmin should be able to ive price for products.
MCA, MITS, 2012
64
FiVaTech Page-Level Web Data Extraction from Template Pages
Test C-se # J A$$ S%bc-te)or.
Test 1:
Incorrect input0 !n empty requirement field# numeric letters.
3ass criteria0 !n appropriate error messae should be displayed and the admin
should not be able to create a subcateory.
Correct input0 1nter te%t as name of product <ame.
3ass criteria0 The !dmin should be allowed to create a Subcateory.
CHAPTER - =
CONC(USION AND ,UTURE 5ORI
=.1 Coc'%sio:
In this paper# we proposed a new )eb data e%traction approach# called
'i(aTech to the problem of pae$level data e%traction. )e formulate the pae
eneration model usin an encodin scheme based on tree templates and schema#
which orani*e data by their parent node in the D+, trees. 'i(aTech contains two
phases0 phase I is merin input D+, trees to construct the fi%ed/variant pattern tree
and phase II is schema and template detection based on the pattern tree.
!ccordin to our pae eneration model# data instances of the same type have
the same path in the D+, trees of the input paes. Thus# the alinment of input D+,
trees can be implemented by strin alinment at each internal node. )e desin a new
alorithm for multiple strin alinment# which ta&es optional$ and set$type data into
consideration. The advantae is that nodes with the same ta name can be better
differentiated by the sub tree they contain. ,eanwhile# the result of alinment ma&es
MCA, MITS, 2012
65
FiVaTech Page-Level Web Data Extraction from Template Pages
pattern minin more accurate. )ith the constructed fi%ed/variant pattern tree# we can
easily deduce the schema and template for the input )eb3aes.
=.2. ,%t%re 0or3:
The proposed pae eneration model with tree$based template matches the
nature of the )eb3aes. ,eanwhile# the mered pattern tree ives very ood result
for schema and template deduction. 'or the sa&e of efficiency# we only use two or
three paes as input. )hether more input paes can improve the performance requires
further study. !lso# e%tendin the analysis to strin contents inside te%t nodes and
matchin schema that is produced due to variant templates are two interestin tas&s
that we will consider ne%t.
CHAPTER - ?
RE,ERENCES
5. !.!rasu and -. .arcia$,olina# B1%tractin Structured Data from )eb 3aes#D
3roc. !C, SI.,+D# pp. 77G$7A?# 6>>7.
6. C.$-. Chan and S.$C. Lui# BI13!D0 Information 1%traction =ased on 3attern
Discovery#D 3roc. Int;l Conf. )orld )ide )eb ()))$5>)# pp. 667$675#
6>>5.
7. C.$-. Chan# ,. Uayed# ,.4. .iris# and U.!. Shaalan# BSurvey of )eb
Information 1%traction Systems#D I111 Trans. Unowlede and Data 1n.# vol.
5?# no. 5># pp. 5A55$5A6?# +ct. 6>>@.
A. (. Crescen*i# .. ,ecca# and 3. ,erialdo# BUnowlede and Data
1nineerins#D 3roc. Int;l Conf. (ery Lare Databases ((LD=)# pp. 5>I$55?#
6>>5.
MCA, MITS, 2012
66
FiVaTech Page-Level Web Data Extraction from Template Pages
F. C.$<. -su and ,. Dun# B.eneratin 'inite$State Transducers for Semi$
Structured Data 1%traction from the )eb#D S. Information Systems# vol. 67#
no. ?# pp. F65$F7?# 5II?.
@. <. Uushmeric&# D. )eld# and 4. Doorenbos# B)rapper Induction for
Information 1%traction#D 3roc. 5Fth Int;l Soint Conf. !rtificial Intellience
(ISC!I)# pp. G6I$G7F# 5IIG.
G. !.-.'. Laender# =.!. 4ibeiro$<eto# !.S. Silva# and S.S. Tei%eira# B! =rief
Survey of )eb Data 1%traction Tools#D SI.,+D 4ecord# vol. 75# no. 6# pp.
?A$I7# 6>>6.
?. =.Lib# 4. .rossman# and J. Xhai# B,inin Data 4ecords in )eb paes#D 3roc.
Int;l Conf. Unowlede Discovery and Data ,inin (UDD)# pp. @>5$@>@#
6>>7.
I. ,uslea# S. ,inton# and C. Unobloc&# B! -ierarchical !pproach to )rapper
Induction#D 3roc. Third Int;l Conf. !utonomous !ents (!! ;II)# 5III.
APENDIK
A. USER &ANUA(
8ser manual provides easier interaction with the system who does not &now
actual desin of the system this is enerally useful to the client or third party to
perform basic operations and installation of the system
A$v-t-)es o! %ser *-%-':
Illustrates the basic thins about the system.
Installation of the system into specified orani*ation
8ser manual eliminates illeal operations of the system.
To ain optimum utili*ation of the system without any drawbac&s
8ser manual is the best source to eliminate errors.
'inally this is a handy tool to manaerial person or hiher authorities to
control and manae the system.
MCA, MITS, 2012
67
FiVaTech Page-Level Web Data Extraction from Template Pages
The installation process is iven below
C'iet si$e ist-''-tio:
'irst of all chec& that whether IIS is enable or not in ystem.
=efore of installation (IS8!L ST8DI+5> we must install the
SCLS14(146>>?.
!fter installin all these softwares.
!dd the bac&up database file to new database in SCLS14(14 database as
restore process.
To e%ecute the pro"ect open our pro"ect code in (IS8L ST8DI+ Solution
1%plorer# then run the parent directory# then the output will display in browser.
:. ,OR&S
,or* N-*e: )elcome pae
Descriptio: -ome 3ae of the pro"ect.
MCA, MITS, 2012
68
FiVaTech Page-Level Web Data Extraction from Template Pages
,or* :.1 5e'co*e p-)e
,or* N-*e: Loin
Descriptio: !dministrator loin into the website.
MCA, MITS, 2012
69
FiVaTech Page-Level Web Data Extraction from Template Pages
,or* :.2. (o)i p-)e
,or* N-*e: !dd sub cateory
Descriptio: !dd sub cateory of product and its information in website.
MCA, MITS, 2012
70
FiVaTech Page-Level Web Data Extraction from Template Pages
,or* :.3 A$$ s%b c-te)or.
,or* N-*e: Chane 3assword
Descriptio: It is to chane the password of administrator.
MCA, MITS, 2012
71
FiVaTech Page-Level Web Data Extraction from Template Pages
,or* :.# C"-)e P-ss0or$
,or* N-*e: (iew 'eedbac&
Descriptio: It is to view the feedbac&s of user.
MCA, MITS, 2012
72
FiVaTech Page-Level Web Data Extraction from Template Pages
,or* :.< >ie0 ,ee$b-c3
,or* N-*e: !dd <ew 3roduct
Descriptio: It is to add the new products into website.
MCA, MITS, 2012
73
FiVaTech Page-Level Web Data Extraction from Template Pages
,or* :.= A$$ Ne0 Pro$%ct
,or* N-*e: (iew 3roducts
Descriptio: In this user can view the products in our website.
MCA, MITS, 2012
74
FiVaTech Page-Level Web Data Extraction from Template Pages
,or* :.? >ie0 Pro$%cts
C. REPORTS
Report N-*e: !dministrator -ome pae.
MCA, MITS, 2012
75
FiVaTech Page-Level Web Data Extraction from Template Pages
Descriptio: It is the identification for admin successful loin.
Report C.1 A$*iistr-tor Ho*e p-)e.
Report N-*e: 3assword chane
Descriptio: the admin was chane the password correctly this report was
enerated.
MCA, MITS, 2012
76
FiVaTech Page-Level Web Data Extraction from Template Pages
Report C.2 P-ss0or$ c"-)e
Report N-*e: !dd 3roduct
Descriptio: The admin was add product to website successfully.
MCA, MITS, 2012
77
FiVaTech Page-Level Web Data Extraction from Template Pages
Report C.3A$$ Pro$%ct
MCA, MITS, 2012
78

You might also like