Professional Documents
Culture Documents
Kevin McManus
Web Technologies
WWW Architecture
Platform: Windows, Mac, Unix, etc.
Client
Browser: IE, Mozilla, Opera, etc.
Request:
http://www.gre.ac.uk/about/
Response:
<html>…</html>
WWW Architecture
• Client-Server Request-Response architecture
• You request a web page
• e.g. http://www.gre.ac.uk/about/index.html
• HTTP request
Internet Standards
• Internet Engineering Task Force (IETF)
http://www.ietf.org/
• founded 1986
• a large open international community of network
designers, operators, vendors, and researchers
concerned with the evolution of the Internet
architecture and the smooth operation of the Internet
• open to any interested individual or organisation
• establishes standards through working groups, mailing
lists and Request For Comments (RFC) documents
http://www.ietf.org/rfc.html
© K.Mc 2008 the University of Greenwich 4
Web Technologies
Web Standards
• World Wide Web Consortium (W3C)
http://www.w3.org
• founded 1994 by Tim Berners-Lee
• currently hosted by MIT, ERCIM and Keio University
• open to corporate membership
• develops interoperable technologies to lead the Web
to its full potential
• produces specifications, standards, technical reports,
guidelines, software, and tools
• a forum for information, commerce, communication,
and collective understanding
• home of the Web Accessibility Initiative (WAI)
• access for all
© K.Mc 2008 the University of Greenwich 5
Web Technologies
Internet / Web
• The Internet (capital I) is a globally
interconnected network of computers
• employs a wide variety of communication
technologies
• wires, fibres, satellites and so on
• supports a number of protocols
• The Web (capital W) is globally
interconnected network of information
• hypertext documents
• using the Internet to connect information
© K.Mc 2008 the University of Greenwich 6
Web Technologies
Clients & Servers
Similarities
Web Browser
• Client-side application
• also known as a User Agent
• Requests resources from web servers
• knows how to parse and render HTML
• may know how to handle images
• may know how to handle client-side scripts
• may use plug-ins to handle other media formats
• Popular browsers
• have a graphical interface
• Mosaic - developed at NCSA circa 1992
• Modern browsers include: Internet Explorer, Mozilla, Netscape, Opera
• compatibility is an issue
• Uncommon browsers
• text only – Lynx
• assistive technologies – Jaws, IBM Home Page Reader
Web Server
• A program running on a server
• Service HTTP requests from web clients
• accepts the request
• returns the requested resource (if it can)
• HTTP response
• usually an HTML web page
• Originally developed as the HTTP daemon (HTTPd) at NCSA
• circa 1995 following work by Tim Berners-Lee at CERN
• Configurable
• deny or grant requests
• provide virtual hosts
• Logs requests and responses
• Popular servers
• Apache (70%), Microsoft Internet Information Server (IIS), Sun One
• Firewall
• a piece of hardware and/or software that prevents
communications forbidden by a security policy
• controls network traffic between different zones of trust
• prevents unauthorized access to a network from the Internet
Networks
• Network = an interconnected collection of
independent computers
• Why have networks?
• resource sharing, information sharing, reliability, communication
• Networked computers can offer so much more than
isolated machines
• Web technologies add:
• global information sharing: search engines, wikis
• applications that do not require a client-side installation
• new business models: e-commerce
• new education models: e-learning
• new ways of organising society: e-government, social networks
• entertainment: radio and television streaming and podcasts
Networks
• Network scope
• internet: a collection of connected networks
• Internet: a specific world-wide network based on IP,
used to connect companies, universities,
governments, organizations and individuals.
• grew from ARPANET, funded by the US DoD.
• intranet: a network based on Internet technologies
that is internal to a company or organization
• extranet: a network based on Internet technologies
that connects one company or organization to another
Application layer
HTTP HTTP
Transport layer
TCP TCP
Internet layer
IP IP
Physical layer
Ethernet Ethernet
Physical Layer
• Defines the physical specifications for devices
• electrical, optical, electromagnetic, dimensional
• Establishes connections to a medium of communication
• copper wire, fibre-optic, wireless
• Ethernet is now the most common implementation
• (actually the data link layer in the OSI model)
• many variations on ethernet
• the local router maps IP addresses to Media Access Control
(MAC) addresses
• 48bit address of an ethernet controller
• must be unique on a subnet
• usually permanently set at point of manufacture
Internet Layer
• Internet Protocol (IP)
• Responsible for communicating packets from source to
destination
• across multiple network hops
• Not guaranteed to be reliable
• IP address: 32 bit value usually written in dotted decimal
notation as four 8-bit numbers (0 to 255) e.g. 130.51.12.4
• globally unique
• for computers connected to the Internet
• limited number of addresses – only 4 billion!
• Network Address Translation (NAT) used to increase capacity
• IPv6 provides increased number of addresses
• 128 bit addresses
Transport Layer
• Provides an efficient, reliable and cost-effective service
• Uses the sockets programming model
• Port numbers are used to identify the application
• well-known ports identify standard services
e.g. HTTP uses port 80, SMTP uses port 25
• can use other port numbers – if they are free
e.g. http://fred.foo.net:8080/bar/myfile.html
• Transmission Control Protocol (TCP)
• connection-oriented byte stream
• guaranteed reliability
• User Datagram Protocol (UDP)
• connectionless
• no guarantee but lower overheads
Application Layer
• Telnet - remote terminal
• File Transfer Protocol (FTP)
• Network News Transfer Protocol (NNTP)
• Simple Network Management Protocol (SNMP)
• Simple Mail Transfer Protocol (SMTP)
• Post Office Protocol (POP)
• Interactive Mail Access Protocol (IMAP)
• Secure Shell (SSH) – secure terminal
• Hypertext Transfer Protocol (HTTP) – the principal
protocol of the World Wide Web
HTTP
• a.k.a. Hypertext Transport Protocol
• HTTP is a simple stateless request-response protocol
• A web client (user agent) requests a resource identified by a
uniform resource locator (URL)
• The web server identified in the URL responds with the file
identified in the URL
• the file may contain static data
• HTML pages, GIFs, JPEGs, Microsoft Word documents, Adobe PDF
documents, etc., etc.
• the file may be a program that runs on the server to output data
• ASP, PHP, Perl, JSP, etc., etc.
• HTTP/1.0 highly successful
• HTTP/1.1 introduced to address flaws in 1.0 and improve network
performance
• pipelining requests and responses
HTTP Methods
• GET, POST, HEAD, PUT, DELETE, TRACE, OPTIONS, CONNECT
• GET and POST are both used to...
• request a resource from a server
• send data with the request
• as name value pairs
• GET appends name value pairs to the URL
• visible in the browser
• can be bookmarked and cached
• safe, idempotent
• POST sends name value pairs after the HTTP header
• not cached
• can carry larger payload
• Differences between GET and POST are subtle and significant
• we will look closely at this later
HTTP Request
Method File name HTTP version
HTTP Response
HTTP version Status code Reason phrase
Headers
HTTP/1.0 200 OK
Date: Thu, 21 Sep 2006 22:06:05 GMT
Server: Apache/1.3.33 (Unix) PHP/4.3.10
Connection: close
Content-Type: text/html
ETag: "5d150-141c-450f244f"
Last-Modified: Mon, 18 Sep 2006 22:57:19 GMT
Content-Length: 5184
201 Created
401 Unauthorized
HTTP
• HTTP is a stateless protocol
• Each HTTP request is independent of previous and subsequent
requests
• HTTP/1.0 defaults to Connection: close
• closes the channel of communication immediately after a response
• Connection: keep-alive was introduced to enable persistent
connections
• no need to re-negotiate a connection for each request
• a connection can be re-used for multiple requests
• HTTP/1.1 defaults to keep-alive for efficiency
• supports pipelining to allow multiple requests to be sent in one TCP packet
State Preservation
• State preservation mechanisms come in three
basic variations:
• cookies
• store a small amount of information on the client
• sent to the server at each HTTP request
• session variables
• a unique identifier is used to associate information stored on
the server with a particular client
• passing data at each request-response cycle
• store information in the web page
• appending data to a URL
• hidden fields in HTML forms
© K.Mc 2008 the University of Greenwich 25
Web Technologies
HTTPS
• A secure version of HTTP
• syntactically identical to HTTP
• Allows client and server to exchange data with
confidence that the data was neither modified nor
intercepted during transmission
• essential when communicating sensitive information over the
Web
• Implements HTTP over Secure Sockets Layer (SSL)
• SSL is also known as Transport Layer Security protocol (TLS)
• uses public key encryption to encode data during transmission
http://staffweb.cms.gre.ac.uk/~mk05/page1.html
Is the same as…
http://193.60.76.168/~mk05/page1.html
Hypertext
• Conventional text has a single linear narrative
path
• time line
• beginning – middle - end
• Hypertext
• multiple paths
• may be read in any order
• possibly an inappropriate order
• not new a new concept
• an indexed or referenced document
• encyclopaedia, academic text
• Computers are very good at traversing indexes
• first computer hypertext system developed by IBM in 1968
• required a mainframe computer
• first popular system Apple HyperCard in 1987
© K.Mc 2008 the University of Greenwich 31
HyperText Markup Language
Web Technologies
HTML
• Originally defined by Tim Berners-Lee circa 1992
• further developed by the IETF
• simplified version of the Standard Generalized Markup Language
(SGML)
• an international standard (ISO) HTML4.01 1999
• later specifications are maintained by the W3C
• Tag based markup
• tags define the structure of a page
• metadata describing how to render the page
• headings, paragraphs, lists, etc.
• tags can have attributes
• provide extra clues about page rendering
e.g. colour, font, size, decoration
• anchor tags link to other (parts of) pages
• hypertext
HTML
<HTML>
<HEAD>
<TITLE>page1.html</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFDD">
<H1>Simple Example HTML page</H1>
<P>
This <I>paragraph</I> contains an anchor tag<BR>
<A HREF="page2.html">click here for the next page</A>
</P>
</BODY>
</HTML>
HTML, XML
HTML, XML
user generated
content
service oriented
architectures
XML
Generation 4
Web 2.0
Questions
Further reading
• The World Wide Web Consortium
http://www.w3.org/
http://www.w3.org/WAI/
http://www.w3.org/Addressing/
• Wikipedia
http://www.wikipedia.org/
• Mozilla
http://www.mozilla.org
• Apache
http://www.apache.org
Questions
• What is the sequence of events in a web browser such
as Mozilla when you follow a link to the following URL?
http://staffweb.cms.gre.ac.uk/~k.mcmanus