You are on page 1of 58

Overview of HTTP

Herng-Yow Chen

Outline

HTTP: the Internets Multimedia Courier Web Clients and Servers Resources Transactions Messages Connections Protocol Versions Architectural Components of the Web
2

HTTP: The Internets Multimedia Courier

Billions of multimedia data cruise through the Internet


Text files, HTML pages Images Videos Programs (e.g. Java applets)

HTTP uses reliable data-transmission protocols (e.g. TCP/IP)


Good for users and application developers Dont have to worry about data/program integrity. How does HTTP transport the Webs traffic?
3

Web Clients and Servers


Web content lives on web servers. The servers speak HTTP, so they are often called HTTP servers. The simplest model: request and response

HTTP clients send HTTP requests to servers, and the servers return the requested data in HTTP responses.

HTTP Request and Response


When you browse to NCNUs homepage
at http://www.ncnu.edu.tw/index.html

HTTP request Get me the document called /index.html HTTP response

client

Okay, here it is, its in HTML format and is 4843 bytes

server

www.ncnu.edu.tw

Resources

Static file resources


on the web servers filesystem. They may be data. They can be programs. generated by software programs in the web server, or generated by remote programs, gateways, or agents.
6

Dynamic content resources

Media Types

HTTP tags the object being transported with a data format label called a MIME types. MIME (Multipurpose Internet Mail Extensions)

Originally designed to solve problems in moving multimedia message between different email systems. MIME worked so well for email that HTTP adopted it to describe and label its own multimedia content.
7

Media Types (cont.)

Web servers attach a MIME type to all HTTP object data. When a web browser get an object back from a server, it display/play the object according to the associated MIME types.

display image files, parse and format HTML files, play audio files, launch external plug-in software, Or launch external helping software.
8

Media Types (cont.)


A MIME type is a textural label. It is represented as a primary object type and a specific subtype, separated by a slash.

An HTML formatted text: text/html. A plain ASCII text file: text/plain. A JPEG image: image/jpeg. A GIF image: image/gif. An Apple QuickTime movie: video/quicktime. A Microsoft PowerPoint file: application/vnd.mspowerpoint.

MIME types (cont.)


HTTP response

Content-type: image/jpeg Content-length: 12345

client

Okay, here it is, its in JPEG format and is 12345 bytes

server

www.csie.ncnu.edu.tw
10

URI

The server resource name is called a uniform resource identifier, or URI. URIs are like the postal addresses of the Internet, uniquely identifying and locating information resources around the world. URIs come in two flavors, called

URLs and URNs

11

URL

The uniform resource locator (URL) is the most common form of resource identifier.

a URL tells precisely where a resource is located and how to access it.
2. Go to where 3. Grab what resource

1. Use what protocol

http://www.csie.ncnu.edu.tw/pics1/.jpg

client

www.csie.ncnu.edu.tw
12

server

Example URLs

Examples:
http://www.ncnu.edu.tw http://english.csie.ncnu.edu.tw/login.php?name=hychen ftp://hychen:1234@ftp.ncnu.edu.tw/img.gif

Most URLs follow a standardized format of three main parts

The first part is called the scheme, and it describes the access protocol The second part gives the server Internet address. The rest names a resource on the server.

Today, almost every URI is a URL.

13

URN

The second flavor of URI is the uniform resource name, or URN. A URN serves as a unique name for a particular piece of content, independent of where the resource currently resides. Advantages:

Location independent: allow resource to move from place to place. Access protocol independent: allow resource to be accessed by multiple network access protocol while maintaining the same name. urn:ietf:rfc:2141

For example, access RFC 2141 by

14

URN (cont.)

URN is still experimental and not yet widely adopted. To work effectively, URNs need a supporting infrastructure to resolve resource location; the lack of such infrastructure has also slowed their adoption. But URNs do hold some exciting promise for future.
15

Transactions

An HTTP transaction consists of a request command (sent from client to server), and a response result (sent from server to client).
This communication happens with formatted blocks of data called HTTP messages.

16

HTTP Transaction (cont.)


HTTP request message contains The command and the URI

GET /pics/hychen.jpg HTTP/1.0 Host: www.csie.ncnu.edu.tw

client

www.csie.ncnu.edu.tw HTTP/1.0 200 OK Content-type: image/jpeg Content-length: 12345

server

HTTP response message contains The result of the transaction

17

Methods

HTTP supports several different request commands, called HTTP methods. Every HTTP request message has a method, which tells the server what action to perform, such as

fetch a web page, run a gateway program delete a file, etc.

18

Some common HTTP methods


HTTP method description GET Send name resource from the server to the client. PUT Send data from client into a named server resource. DELETE Delete the named resource from a server. POST Send client data into a server gateway application. HEAD Send just the HTTP headers from the response
for the named resource.

19

Status Codes

Every HTTP response message comes back with a status code, a three-digit number code that tells the client

If the request succeeded, or If other actions are required.

HTTP also sends an explanatory texture reason phrase followed by each status code. The texture phrase is included only for descriptive purposes; the numeric code is used for all processing.
20

Some common HTTP status code


HTTP status code description 200 OK. Document returned correctly. 302 Redirect. Go someplace else to get the resource. 404 Not Found. Cant find this resource.

The following status codes and reason phrases are treated identically by HTTP software:

200 200 200

OK Document attached Success

21

Web pages can consists of multiple objects

A web browser issues a cascade of HTTP transactions to fetch and display a graphics-rich web page. First, the browser performs one transaction to fetch the HTML skeleton, then issues additional HTTP transactions for each embedded image, graphic pane, Java applet, etc. Note that these embedded resources might even reside on different servers.
22

Composite web pages require separate HTTP transactions

Server 1

Internet
Server 2

Server 3
23

Messages

Request Message vs. Response Message HTTP messages consists of three parts:

Start line:

The first line of the message. Indicate what to do for a request or what happened for a response. Zero or more header field follow the start line. Each header field consists of name and a value, separated by colon (:) for easy parsing. The headers end with a blank line. Is an optional part containing any kind of data (e.g. textural and binary data). Request bodies carry data to the web server; response body carry data back to the client.
24

Header fields:

Body:

A line-oriented text message structure


(a) Request message GET /text/hi-there.txt HTTP/1.0 Accept: text/* Accept-Language: en, fr Start line Headers Body (a) Response message HTTP/1.0 200 OK Contnet-type: text/plain Content-legth: 19 Hi! Im a message!

25

Another message example


(a) Request message GET /tools.html HTTP/1.0 User-agent: Mozilla/4.75[en] Host: www.csie.ncnu.edu.tw Accept: text/html, image/gif, image/jpeg Accept-Language: en (a) Response message HTTP/1.0 200 OK Date: Sun, 01 Oct 2003 23:25:17 GMT Server: Apache/1.3.11 Last-modified: Tue: 04 Jul 2003 09:46:21 GMT Contnet-type: text/html Content-legth: 403 <HTML> <HEAD> Web Technologies </HEAD> <BODY> <H1> Web technolgies </H1> </Body> </HTML>

26

Connections

How is an HTTP message moved from place to place, across Transmission Control Protocol (TCP) connections?
HTTP is an application layer protocol, which doesnt worry about the details of network communication. Instead, it leaves the details of networking to TCP/IP.

27

TCP/IP

TCP provides:

Error-free data transportation In-order delivery Unsegmented data stream

The HTTP protocol is layered over TCP. Namely, HTTP uses TCP to transport its message data. Likewise, TCP is layered over IP.
28

HTTP network protocol stack


HTTP
Application layer Transport layer Network layer Data link layer Physical layer

TCP
IP

Network-specific link interface


Physical network hardware

29

Connections, IP addresses, Port Numbers

Before HTTP client can send a message to a server, it needs to establish a TCP/IP connection between the client and server using Internet protocol (IP) address and port numbers. DNS server: Domain name -> IP Default port number: 80

30

c. Send the request GET /~hychen/index.html HTTP/1.0 User-agent: Netscape Accept: text/plain Accept: text/html Accept: image/*

Listen ... port 80.

Internet

b. Find & setup connection to www.csie.ncnu.edu.tw a. click anchor: <A href=http://www.csie.ncnu.edu.tw:80/~hychen/index.html> 31

Parse b. Send error headers to client the request HTTP/1.0 403 Not Found Method: GET Server: Apache 1.2b7 Document: /~hychen/index.html Date: Thu, 22, May ... Protocol: HTTP, version1.0 Content-type: text/html User-agent: Netscape Content-length: 0text/plain,text/html,image/* Accept:

a. Look for /~hychen/index.html

d. break connection

Internet
b. Send headers to client
HTTP/1.0 200 Document follows Server: Apache 1.2b7 Date: Thu, 22, May 1997 14:00:00 GMT Content-type: text/html Content-length: 1066 Last-modified: Sun, 18, May 1997 .... c. Send file (index.html) to client 32

Dynamic HTTP connection

33

c. Send the request GET /cgi-bin/add?name=hychen& year=58&month=6&..... ...... HTTP/1.0 User-agent: Netscape Accept: text/plain Accept: text/html Accept: image/*

Listen ...

Internet

b. Find & setup connection to www.csie.ncnu.edu.tw a. Submit : <form action=cgi-bin/add method =GET>
34

d. httpd sends headers & parse theclient result to request Status: 200 Document follows GET /cgi-bin/add?name=hychen& Server: Apache 1.2b7 year=58.... HTTP/1.0 Date: .... User-agent: Netscape Contenet-type: text/html Accept: text/plain c. add returns html to httpd Accept: text/html Content-type: text/html Accept: image/* <html> <head> <title> .... </title></head> Internet <body> <h1> Add successfully! </h1> </body> </html>
cgi-bin

add httpd add

b. Setup excutable environment Query_String: name=hychen&year=58&.....


35

Simulate an HTTP client using Telnet

The Telnet utility can connect your keyboard to a destination TCP port and connects the TCP port output back to your display screen. You can use Telnet utility to talk directly to web servers. The web server treat you as a web client, and any data sent back on the TCP port is displayed on screen. Try: telnet www.csie.ncnu.edu.tw 80
36

Try another tool

For a more flexible tool, you can check out nc (netcat). The nc tool lets you easilymanipulate and script UDPs and TCPs-based traffic, including HTTP. See http://www.bgw.org/tutorials/utilities/nc.php for details

37

Protocol Versions

There are several versions of the HTTP protocol in use today. The HTTP applications need to work hard to robustly handle different variations of them. HTTP/0.9 HTTP/1.0 HTTP/1.0+ HTTP/1.1 HTTP-NG (a.k.a. HTTP/2.0)
38

HTTP/0.9

The 1991 prototype version of HTTP is known as HTTP/0.9. This protocol contains many serious design flaws and should be used only to interoperate with legacy clients. HTTP/0.9 supports only the GET method to fetch simple HTML objects; it does not support MIME type, HTTP headers, or version numbers. It was soon replaced with HTTP/1.0.
39

HTTP/1.0

1.0 was the first version of HTTP that was widely deployed. HTTP/1.0 added version numbers, HTTP headers, additional methods, and multimedia object handling. HTTP/1.0 made it practical to support graphically appealing web pages and interactive forms, which helped promote the wide-scale adoption of the WWW. The specification was never specified. It represented a collection of best practices in time of rapid commercial and academic evolution of the protocol.
40

HTTP/1.0+

Many popular web clients and servers rapidly added features to HTTP in the mid-1990s to meet the demands of a rapidly expanding, commercially successful WWW. Many of these features, including long-lasting keep-alive connections, virtual hosting support, and proxy connection support, were added to HTTP and became unofficial, de facto standards. This informal, extended version of HTTP is often referred to as HTTP/1.0+.
41

HTTP/1.1

HTTP/1.1 focused on correcting architectural flaws in the design of HTTP, specifying semantics, introducing significant performance optimizations, and removing mis-features. HTTP/1.1 also included support for the more sophisticated web applications and development that were under way in the late 1990s. HTTP/1.1 is the current version of HTTP.
42

HTTP-NG (a.k.a. HTTP/2.0)

HTTP-NG is a prototype proposal for an architectural successor to HTTP/1.1 that focuses on significant performance optimizations and a more powerful framework for remote execution of server logic. The HTTP-NG research effort concluded in 1998, and so far, there are no plans to advance this proposal as a replacement for HTTP/1.1.
43

Architectural Components of the Web

In addition to most popular web applications (i.e., web browsers and servers), there are many other web applications that we interact with on the Internet, including:

Proxies: HTTP intermediaries that sit between clients and servers Caches: HTTP storehouses that keep copies of popular web pages close to clients. Gateways: Special web servers that connect to other applications. Tunnels: Special proxies that blindly forward HTTP communications. Agents: Semi-intelligent web clients that make automated HTTP requests.
44

Proxies

HTTP proxy servers, sitting between clients and servers, are important building blocks for web security, application integration, and performance optimization.

Receiving all of the clients HTTP requests, And Replaying the requests to the server (perhaps after modifying the requests). These applications act as a proxy for the user, accessing the server on the users behalf.

Proxies are often used for security, acting as trusted intermediaries through which all web traffic flows.

Can also filter requests and responses; for example, To detect application viruses in message. To filter adult content away from elementary-school student.

We will talk about proxies in detail in later lectures.


45

Proxy (cont.)

Proxies relay traffic between client and server.

Proxy

Internet

46

Caches

A web cache or caching proxy is a special type of HTTP proxy server that keeps copies of popular documents passing through the proxy. The next client requesting the same document can be served from the caches personal copy; consequently, the client may be able to download the document much more quickly from a nearby cache that from a distance web server. HTTP defines may facilities to make caching more effective and to regulate the freshness and privacy of cached content. We shall talk about caching technology in a later lecture. 47

Caches (cont.)

Caching proxies keep local copies of popular document to improve performance.

Proxy

Internet

48

Gateways

Gateways are special servers that act as intermediaries for other servers. They are often used to convert HTTP traffic to another protocol. A gateway always receives requests as if it was the origin server for the resource; the client may not be aware it is communicating with a gateway. For example, in the following figure, an HTTP/FTP gateway receives requests for FTP URIs via HTTP requests but fetches the documents using the FTP protocol. The resulting document is packed into an HTTP message and sent to the client. We shall talk about caching technology in a later lecture.
49

HTTP/FTP gateway

HTTP HTTP/FTP gateway

FTP

HTTP client

50

Tunnels

Tunnels are HTTP applications that, after setup, blindly relay raw data between two connections. HTTP tunnels are often to transport non-HTTP data over one or more HTTP connections, without looking at the data. One popular use of HTTP tunnel is to carry encrypted Secure Application Level (SAL) traffic through an HTTP connection, allowing SAL traffic through corporate firewall that permit only web traffic. As presented in the next slide, an HTTP/SAL tunnel receives an HTTP request to establish an outgoing connection to a destination address and port, then proceeds to tunnel the encrypted SAL traffic over the HTTP channel so that it can be blindly relayed to the destination server.
51

HTTP tunnel forwards data across non-HTTP networks


SAL
SAL connection

SAL Tunnel start HTTP SSL


Http connection

SAL
Tunnel endpoint
Port 80

HTTP SSL

52

Agents

User agents (or just agents) are client programs that make HTTP requests on the users behalf. Any application that issues web requests is an agent, such as a web browser. Other kinds of user agents: spiders or web robots

Machine-automated user agents that autonomously wander the Web, issuing the HTTP transactions and fetching content, without human supervision.

We shall talk about issues of spider for search engines in a later lecture.
53

Agents (cont.)

Web server

Web server

Web server

Search engine spider

Search engine database


54

HTTP protocol information

http://www.w3.org/Protocols/

many great links about HTTP protocols. RFC 2616, Hypertext Transfer Protocol- HTTP/1.1, is the official specification for HTTP/1.1, the current version of the HTTP protocol. RFC 1945, Hypertext Transfer Protocol- HTTP/1.0, is an informational RFC that describes the modern foundation for HTTP. A description of the 1991 HTTP/0.9 protocol, which implements only GET requests and has no content typing.

http://www.ietf.org/rfc/rfc2616.txt

http://www.ietf.org/rfc/rfc1945.txt

http://www.w3.org/Protocols/HTTP/AsImplemented.html

55

Historical Perspective

http://www.w3.org/Protocols/WhyHTTP.html

This brief page from 1991, from the author of HTTP, highlights some of the original, minimalist goals of HTTP. A Little History of the World Wide Web gives a short but interesting perspective on the early goal of the WWW and HTTP. Web Architecture from 50,000 Feet paints a broad, a ambitious view of the WWW and the design principles that affect HTTP and related web technologies.
56

http://www.w3.org/History.html

http://www.w3.org/DesignIssues/Architecture.html

Other World Wide Web info.

http://www.w3.org/

The World Wide Web Consortium (W3C) is the technology steering team for the Web. The W3C develops interoperable technologies (specifications, guidelines, software, and tools) for the evolving Web. RFC 2396, Uniform Resource Identifiers (URI): Generic Syntax, is the detailed reference for URIs and URLs. RFC 2141, URN Syntax, is a 1997 specification describing URN sytax. RFC 2046, MIME Part 2: Media Types, is the second in a suite of five Internet specifications defining the Multipurpose Internet Mail Extenions stand for mutimedia content management. This Internet Draft, Internet Web Replication and Caching Taxonomy, specifies standard terminology for web architectural components.

http://www.ietf.org/rfc/rfc2896.txt

http://www.ietf.org/rfc/rfc2141.txt

http://www.ietf.org/rfc/rfc2046.txt

http://www.wrec.org/Drafts/draft-ietf-wrec-taxonomy-06.txt

57

Summary

We gave a quick introduction to HTTP. We highlighted HTTPs role as an multimedia transport protocol, with the help of MIME (after HTTP/1.0). We outlined how HTTP uses URIs to name multimedia resources. We sketched how HTTP request and response messages are used to access remote multimedia resources. We surveyed other different types of HTTP applications other than web browsers.
58

You might also like