You are on page 1of 45

Web Technologies

Kevin McManus
Web Technologies

WWW Architecture
Platform: Windows, Mac, Unix, etc.
Client
Browser: IE, Mozilla, Opera, etc.
Request:
http://www.gre.ac.uk/about/

Network HTTP over TCP/IP

Response:
<html>…</html>

Server Platform: Windows, Mac, Unix, etc.

Web Server: Apache, IIS, Xitami, etc.

© K.Mc 2008 the University of Greenwich 2


Web Technologies

WWW Architecture
• Client-Server Request-Response architecture
• You request a web page
• e.g. http://www.gre.ac.uk/about/index.html
• HTTP request

• The web server responds with data


• HTTP response
• usually in the form of a web page (HTML document)
• could be any file format
• web page is written using HyperText Markup Language (HTML)

• Web pages are identified by a Uniform Resource Locator (URL)


• protocol: e.g. http
• web server: e.g. www.gre.ac.uk
• [machine name].[domain name]
• web page: e.g. about/index.html

© K.Mc 2008 the University of Greenwich 3


Web Technologies

Internet Standards
• Internet Engineering Task Force (IETF)
http://www.ietf.org/
• founded 1986
• a large open international community of network
designers, operators, vendors, and researchers
concerned with the evolution of the Internet
architecture and the smooth operation of the Internet
• open to any interested individual or organisation
• establishes standards through working groups, mailing
lists and Request For Comments (RFC) documents
http://www.ietf.org/rfc.html
© K.Mc 2008 the University of Greenwich 4
Web Technologies

Web Standards
• World Wide Web Consortium (W3C)
http://www.w3.org
• founded 1994 by Tim Berners-Lee
• currently hosted by MIT, ERCIM and Keio University
• open to corporate membership
• develops interoperable technologies to lead the Web
to its full potential
• produces specifications, standards, technical reports,
guidelines, software, and tools
• a forum for information, commerce, communication,
and collective understanding
• home of the Web Accessibility Initiative (WAI)
• access for all
© K.Mc 2008 the University of Greenwich 5
Web Technologies

Internet / Web
• The Internet (capital I) is a globally
interconnected network of computers
• employs a wide variety of communication
technologies
• wires, fibres, satellites and so on
• supports a number of protocols
• The Web (capital W) is globally
interconnected network of information
• hypertext documents
• using the Internet to connect information
© K.Mc 2008 the University of Greenwich 6
Web Technologies
Clients & Servers
Similarities

• Client and server computers both usually have:


• hardware
• Central Processing Unit (CPU)
• e.g. Intel Pentium, AMD Athlon, IBM PPC, Sun Sparc
• memory
• I/O
• Visual Display Unit (VDU), storage (fixed, removable), network
• bus to connect it all together
• software
• multi-tasking operating system
• Unix, Linux, NT, XP
• file system
• applications

© K.Mc 2008 the University of Greenwich 7


Web Technologies
Clients & Servers
Differences
• Clients
• generally support a single user
• optimized for responsiveness to user
• have a user interface, graphics
• have client applications
• e.g. web browsers
• Servers
• supports multiple users
• optimized for throughput
• more: CPUs (SMP), memory, disks (SANs), I/O
• provide services
• e.g. web, file, print, database, e-mail, telnet, directory
• provides a high quality of service
• RAID, UPS, redundant power supplies, hot swap devices

© K.Mc 2008 the University of Greenwich 8


Web Technologies

Web Browser
• Client-side application
• also known as a User Agent
• Requests resources from web servers
• knows how to parse and render HTML
• may know how to handle images
• may know how to handle client-side scripts
• may use plug-ins to handle other media formats
• Popular browsers
• have a graphical interface
• Mosaic - developed at NCSA circa 1992
• Modern browsers include: Internet Explorer, Mozilla, Netscape, Opera
• compatibility is an issue
• Uncommon browsers
• text only – Lynx
• assistive technologies – Jaws, IBM Home Page Reader

© K.Mc 2008 the University of Greenwich 9


Web Technologies

Web Server
• A program running on a server
• Service HTTP requests from web clients
• accepts the request
• returns the requested resource (if it can)
• HTTP response
• usually an HTML web page
• Originally developed as the HTTP daemon (HTTPd) at NCSA
• circa 1995 following work by Tim Berners-Lee at CERN
• Configurable
• deny or grant requests
• provide virtual hosts
• Logs requests and responses
• Popular servers
• Apache (70%), Microsoft Internet Information Server (IIS), Sun One

© K.Mc 2008 the University of Greenwich 10


Web Technologies

Proxy Servers & Firewalls


• Proxy Server
• a server that sits between a client and the Internet
• improves performance by caching frequently accessed resources
• essential to achieve scalability of the Web
• can filter requests to prevent access to certain web sites
• used to implement censorship
• can alter the client's request or the server's response
• useful but open to abuse

• Firewall
• a piece of hardware and/or software that prevents
communications forbidden by a security policy
• controls network traffic between different zones of trust
• prevents unauthorized access to a network from the Internet

© K.Mc 2008 the University of Greenwich 11


Web Technologies

Networks
• Network = an interconnected collection of
independent computers
• Why have networks?
• resource sharing, information sharing, reliability, communication
• Networked computers can offer so much more than
isolated machines
• Web technologies add:
• global information sharing: search engines, wikis
• applications that do not require a client-side installation
• new business models: e-commerce
• new education models: e-learning
• new ways of organising society: e-government, social networks
• entertainment: radio and television streaming and podcasts

© K.Mc 2008 the University of Greenwich 12


Web Technologies

Networks
• Network scope
• internet: a collection of connected networks
• Internet: a specific world-wide network based on IP,
used to connect companies, universities,
governments, organizations and individuals.
• grew from ARPANET, funded by the US DoD.
• intranet: a network based on Internet technologies
that is internal to a company or organization
• extranet: a network based on Internet technologies
that connects one company or organization to another

© K.Mc 2008 the University of Greenwich 13


Web Technologies

Network Protocol Stack

Application layer
HTTP HTTP

Transport layer
TCP TCP

Internet layer
IP IP

Physical layer
Ethernet Ethernet

© K.Mc 2008 the University of Greenwich 14


Networks
Web Technologies

Physical Layer
• Defines the physical specifications for devices
• electrical, optical, electromagnetic, dimensional
• Establishes connections to a medium of communication
• copper wire, fibre-optic, wireless
• Ethernet is now the most common implementation
• (actually the data link layer in the OSI model)
• many variations on ethernet
• the local router maps IP addresses to Media Access Control
(MAC) addresses
• 48bit address of an ethernet controller
• must be unique on a subnet
• usually permanently set at point of manufacture

© K.Mc 2008 the University of Greenwich 15


Networks
Web Technologies

Internet Layer
• Internet Protocol (IP)
• Responsible for communicating packets from source to
destination
• across multiple network hops
• Not guaranteed to be reliable
• IP address: 32 bit value usually written in dotted decimal
notation as four 8-bit numbers (0 to 255) e.g. 130.51.12.4
• globally unique
• for computers connected to the Internet
• limited number of addresses – only 4 billion!
• Network Address Translation (NAT) used to increase capacity
• IPv6 provides increased number of addresses
• 128 bit addresses

© K.Mc 2008 the University of Greenwich 16


Networks
Web Technologies

Transport Layer
• Provides an efficient, reliable and cost-effective service
• Uses the sockets programming model
• Port numbers are used to identify the application
• well-known ports identify standard services
e.g. HTTP uses port 80, SMTP uses port 25
• can use other port numbers – if they are free
e.g. http://fred.foo.net:8080/bar/myfile.html
• Transmission Control Protocol (TCP)
• connection-oriented byte stream
• guaranteed reliability
• User Datagram Protocol (UDP)
• connectionless
• no guarantee but lower overheads

© K.Mc 2008 the University of Greenwich 17


Networks
Web Technologies

Application Layer
• Telnet - remote terminal
• File Transfer Protocol (FTP)
• Network News Transfer Protocol (NNTP)
• Simple Network Management Protocol (SNMP)
• Simple Mail Transfer Protocol (SMTP)
• Post Office Protocol (POP)
• Interactive Mail Access Protocol (IMAP)
• Secure Shell (SSH) – secure terminal
• Hypertext Transfer Protocol (HTTP) – the principal
protocol of the World Wide Web

© K.Mc 2008 the University of Greenwich 18


Hypertext Transfer Protocol
Web Technologies

HTTP
• a.k.a. Hypertext Transport Protocol
• HTTP is a simple stateless request-response protocol
• A web client (user agent) requests a resource identified by a
uniform resource locator (URL)
• The web server identified in the URL responds with the file
identified in the URL
• the file may contain static data
• HTML pages, GIFs, JPEGs, Microsoft Word documents, Adobe PDF
documents, etc., etc.
• the file may be a program that runs on the server to output data
• ASP, PHP, Perl, JSP, etc., etc.
• HTTP/1.0 highly successful
• HTTP/1.1 introduced to address flaws in 1.0 and improve network
performance
• pipelining requests and responses

© K.Mc 2008 the University of Greenwich 19


Web Technologies

HTTP Methods
• GET, POST, HEAD, PUT, DELETE, TRACE, OPTIONS, CONNECT
• GET and POST are both used to...
• request a resource from a server
• send data with the request
• as name value pairs
• GET appends name value pairs to the URL
• visible in the browser
• can be bookmarked and cached
• safe, idempotent
• POST sends name value pairs after the HTTP header
• not cached
• can carry larger payload
• Differences between GET and POST are subtle and significant
• we will look closely at this later

© K.Mc 2008 the University of Greenwich 20


Web Technologies

HTTP Request
Method File name HTTP version

GET /~k.mcmanus/index.html HTTP/1.1


Host: staffweb.cms.gre.ac.uk
Connection: close
Accept: text/xml,text/html,text/plain,image/png,*/*
Accept-Language: en-gb,en
User-Agent: Mozilla/4.0 (compatible;MSIE 6.0;Windows NT 5.0)
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*
If-Modified-Since: Mon, 18 Sep 2006 22:57:19 GMT
Referer: http://web-sniffer.net

Blank line Headers

Data – none for GET


© K.Mc 2008 the University of Greenwich 21
Web Technologies

HTTP Response
HTTP version Status code Reason phrase
Headers

HTTP/1.0 200 OK
Date: Thu, 21 Sep 2006 22:06:05 GMT
Server: Apache/1.3.33 (Unix) PHP/4.3.10
Connection: close
Content-Type: text/html
ETag: "5d150-141c-450f244f"
Last-Modified: Mon, 18 Sep 2006 22:57:19 GMT
Content-Length: 5184

<?xml version="1.0" encoding="UTF-8"?>


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict Data
<html xmlns="http://www.w3.org/1999/xhtml">
...
</html>
© K.Mc 2008 the University of Greenwich 22
Web Technologies

HTTP Server Status Codes


Code Description
200 OK

201 Created

301 Moved Permanently

302 Moved Temporarily

400 Bad Request – not understood

401 Unauthorized

403 Forbidden – not authorized

404 Not Found

500 Internal Server Error

© K.Mc 2008 the University of Greenwich 23


Web Technologies

HTTP
• HTTP is a stateless protocol
• Each HTTP request is independent of previous and subsequent
requests
• HTTP/1.0 defaults to Connection: close
• closes the channel of communication immediately after a response
• Connection: keep-alive was introduced to enable persistent
connections
• no need to re-negotiate a connection for each request
• a connection can be re-used for multiple requests
• HTTP/1.1 defaults to keep-alive for efficiency
• supports pipelining to allow multiple requests to be sent in one TCP packet

• The stateless nature of HTTP has a big impact on how web


applications are designed
• we will look very closely at this

© K.Mc 2008 the University of Greenwich 24


Web Technologies

State Preservation
• State preservation mechanisms come in three
basic variations:
• cookies
• store a small amount of information on the client
• sent to the server at each HTTP request
• session variables
• a unique identifier is used to associate information stored on
the server with a particular client
• passing data at each request-response cycle
• store information in the web page
• appending data to a URL
• hidden fields in HTML forms
© K.Mc 2008 the University of Greenwich 25
Web Technologies

HTTPS
• A secure version of HTTP
• syntactically identical to HTTP
• Allows client and server to exchange data with
confidence that the data was neither modified nor
intercepted during transmission
• essential when communicating sensitive information over the
Web
• Implements HTTP over Secure Sockets Layer (SSL)
• SSL is also known as Transport Layer Security protocol (TLS)
• uses public key encryption to encode data during transmission

© K.Mc 2008 the University of Greenwich 26


Web Technologies

URIs, URLs and URNs


• Uniform Resource Identifier (URI = URL or URN)
• generic term for all resource names and addresses

• Uniform Resource Locator (URL)


• a set of URI schemes that have explicit instructions on how to
access a resource over the Internet
• globally unique
http://w3.foo.net:8080/bar/index.php?fruit=plum&user=joe
[protocol]://[host]:[port]/[file path]?[arg]=[val]&[arg]=[val]

• Uniform Resource Name (URN)


• a URI that has an institutional commitment to availability and
persistence
• http://www.w3.org/Addressing
• http://www.w3.org/Addressing/URL/5_BNF.html
© K.Mc 2008 the University of Greenwich 27
Web Technologies

Multipurpose Internet Mail


Extensions MIME
• Originally designed for email, also used for HTTP
• Tells the browser how to interpret the incoming data
• Defines types of data/documents
• ASCII - text/plain, text/html, text/xml
• image formats - image/gif, image/jpeg
• audio formats - audio/x-aiff, audio/mpeg3
• binary data - application/octet-stream
• Applied by the web server according to the filename
extension
e.g. a file called daisy.png will be sent with a mime type image/png

© K.Mc 2008 the University of Greenwich 28


Web Technologies

Domain Name System DNS


• Human-friendly domain names,
gre.ac.uk
• Globally unique identification of computers
bukowski.gre.ac.uk
• Hierarchical name space with limited root names
• organisational: .com .net .gov .edu .org .mil
• national: .uk .jp .de .fr .tv etc.
• Internet Corporation For Assigned Names and Numbers (ICANN)
assumes responsibility for global coordination of the namespace
• ICANN assigns control of each namespace to a registration authority
• e.g. VeriSign for .com, Nominet for .uk
• the Joint Academic Network (JANET) acts as authority for .ac.uk
• JANET devolves authority for .gre.ac.uk to the University of
Greenwich

© K.Mc 2008 the University of Greenwich 29


Web Technologies

Domain Name System DNS


• DNS servers map domain names to IP addresses
• usually using the Berkeley Internet Name Daemon (BIND)
• actually mapping fully qualified machine names to IP addresses
• Web client contacts it’s local DNS server to translate the
domain part of a URL into an IP address
• If the local DNS server cannot resolve the address then the
request is passed to DNS at the next level of controlling authority
• resolved addresses are cached by the local DNS server
• and by the browser
• The browser can then send an HTTP request to the IP address

http://staffweb.cms.gre.ac.uk/~mk05/page1.html
Is the same as…
http://193.60.76.168/~mk05/page1.html

© K.Mc 2008 the University of Greenwich 30


Web Technologies

Hypertext
• Conventional text has a single linear narrative
path
• time line
• beginning – middle - end
• Hypertext
• multiple paths
• may be read in any order
• possibly an inappropriate order
• not new a new concept
• an indexed or referenced document
• encyclopaedia, academic text
• Computers are very good at traversing indexes
• first computer hypertext system developed by IBM in 1968
• required a mainframe computer
• first popular system Apple HyperCard in 1987
© K.Mc 2008 the University of Greenwich 31
HyperText Markup Language
Web Technologies

HTML
• Originally defined by Tim Berners-Lee circa 1992
• further developed by the IETF
• simplified version of the Standard Generalized Markup Language
(SGML)
• an international standard (ISO) HTML4.01 1999
• later specifications are maintained by the W3C
• Tag based markup
• tags define the structure of a page
• metadata describing how to render the page
• headings, paragraphs, lists, etc.
• tags can have attributes
• provide extra clues about page rendering
e.g. colour, font, size, decoration
• anchor tags link to other (parts of) pages
• hypertext

© K.Mc 2008 the University of Greenwich 32


Web Technologies

HTML
<HTML>
<HEAD>
<TITLE>page1.html</TITLE>
</HEAD>
<BODY BGCOLOR="#FFFFDD">
<H1>Simple Example HTML page</H1>
<P>
This <I>paragraph</I> contains an anchor tag<BR>
<A HREF="page2.html">click here for the next page</A>
</P>
</BODY>
</HTML>

© K.Mc 2008 the University of Greenwich 33


Web Technologies

Cascading Style Sheets CSS


• Rules to control HTML web page rendering in the web browser
• provides greater styling control than HTML
• Author styles
• external style sheets
• one style sheet can be used with many web pages
• one web page can use many style sheets
• improves consistency of style across the pages of a web site
• easier updating and maintenance of the web site
• embedded styles
• rules embedded in the head of an HTML page
• inline styles
• rules as attributes in individual HTML tags
• User styles and user agent styles
• applied by the user to cater for their individual needs
• Style rules cascade
• inherited from parent tag to child tag
• from external to embedded to inline

© K.Mc 2008 the University of Greenwich 34


Web Technologies

Client Side Scripting


• Executable script embedded into HTML pages
• Parsed and executed by the web client
• Usually JavaScript
• native support in most web clients
• Script may be included as:
• an external file
• embedded in the page head
• inline with the page content
• Can access and operate on page contents
• using the Document Object Model (DOM)
• Can respond to events in the browser
• e.g. onClick, onMouseOver, onKeyPress
• Used to enhance the user experience
• e.g. image rollovers, form data validation
© K.Mc 2008 the University of Greenwich 35
Web Technologies
Extensible Markup Language
XML
• Simplified subset of SGML
• A meta-language - extensible
• a language for defining other languages
e.g. XHTML, MathML, SVG, RSS, RDF
• Represents hierarchical data
• tree structure
• human and machine readable format
• Useful for data exchange and transformation
• Facilitates separation of content from presentation
• Enabling technology for web services and the semantic
web

© K.Mc 2008 the University of Greenwich 36


Web Technologies
Extensible Hypertext Markup Language
XHTML
• XHTML is an XML conforming HTML
• XHTML 1.0 first published in 2000
• three variants – transitional, frameset, strict
• XHTML 1.1 became a W3C recommendation in 2001
• strict and modular
• Many HTML tags and attributes are deprecated
• all styling is deprecated from strict XHTML
• Strict syntax forces separation of content from presentation
• XHTML tags describe only the page structure
• greatly simplifies page markup
• CSS is used to provide presentation
• cleaner code improves legibility, maintenance and accessibility
• recommended by WAI

© K.Mc 2008 the University of Greenwich 37


Web Technologies

Multipart HTML documents


• It is usual for HTML documents to be composed
from several component parts such as
• CSS
• JavaScript
• images
• media - audio, video, Shockwave (Flash movies)
• applets – small Java applications
• Each component part has to be downloaded from
a web server
• multiple HTTP requests are required to download a
single web page
• HTTP 1.1 can pipeline these requests
• components are not necessarily from the same web server
© K.Mc 2008 the University of Greenwich 38
Web Technologies

Server Side Scripting


• Application program running on the web server
• output is returned to the web browser
• usually HTML
• Can access resources on the server
• files, databases
• Common Gateway Interface (CGI)
• standard way to allow programs to run on the web server
• often Perl scripts
• may be written in any language the server supports
• output from the program (STDOUT) is routed through the web server
back to the client
• Web server scripting environments
• executable script embedded into HTML pages
e.g. Active Server Pages (ASP), PHP Hypertext Preprocessor (PHP),
Java Server Pages (JSP), Server Side Includes (SSI)

© K.Mc 2008 the University of Greenwich 39


Web Services
Web Technologies

Evolution of the Web

HTML, XML

HTML HTML XML XML

Generation 1 Generation 2 Generation 3


Static HTML Web Applications Web Services

© K.Mc 2008 the University of Greenwich 40


Web Technologies

Evolution of the Web


• The Web was originally conceived to serve static HTML pages over
HTTP
• In a short period of time many technologies were introduced and
developed to provide dynamic, interactive web pages and stateful
web applications
• In an even shorter period of time the Web has dramatically changed
many of the ways in which we work, relax and function as
individuals and as a society
• Web technologies continue to advance to support service oriented
architectures and the semantic web
• From this soup of technologies Web2.0 has evolved

© K.Mc 2008 the University of Greenwich 41


Web Technologies

Evolution of the Web


asynchronous
partial page
updates

HTML, XML
user generated
content
service oriented
architectures
XML

Generation 4
Web 2.0

© K.Mc 2008 the University of Greenwich 42


Web Technologies

Questions

© K.Mc 2008 the University of Greenwich 43


Web Technologies

Further reading
• The World Wide Web Consortium
http://www.w3.org/
http://www.w3.org/WAI/
http://www.w3.org/Addressing/

• Wikipedia
http://www.wikipedia.org/

• Mozilla
http://www.mozilla.org

• Apache
http://www.apache.org

• Web Resources collected by Kevin McManus


http://staffweb.cms.gre.ac.uk/~k.mcmanus/web

© K.Mc 2008 the University of Greenwich 44


Web Technologies

Questions
• What is the sequence of events in a web browser such
as Mozilla when you follow a link to the following URL?
http://staffweb.cms.gre.ac.uk/~k.mcmanus

• What are the advantages of using a simple stateless


protocol to implement the Web?

• Why was HTTP1.1 developed?

• What MIME type will a web server respond with for a


filename extension of .php?

• What are stateful web applications?


© K.Mc 2008 the University of Greenwich 45

You might also like