You are on page 1of 58

Web Basics

Content
• Internet & The Web
• Client-server architecture
• TCP Protocol
• UDP Protocol
• OSI Stack
• TCP/IP Stack
• HTTP Protocol
• Web Servers
• Web Browsers
• Developer tools
• State Management: Cookies, Sessions
Internet(1)
● Global system of interconnected computer networks
● “Network of networks”
● Systems built on client-server architecture model
● Collection of services
○ World Wide Web
○ Email
○ FTP
● Protocols: HTTP, TCP-IP Stack
○ Various means of access: broadband,phone,cellular
○ Communication done via Internet Protocol(IP)
○ World- Wide- Web: service of the Internet for accesing
resources.
Internet(2)
Internet Protocol(IP)
● Main communications protocol on the
internet
● Deliver packets from host to destination
based on IP addresses in the packets layer.
● No central monitoring, performance
measurement facility
● Versions: IPv4, IPv6
URL(Uniform Resource
Locator)
● Every item on the Web=> Resource having an unique
identification:URI
● Provides reference to a web resource and a mechanism to
retrieve it.
● Domain name: identification string defining an
autonomy/authority on the Internet
○ DNS: Domain Name System
■ Hierarchical: .com, .info, .net,.edu
■ Domain names are case insensitive
● Typical URL Form:
○ PROTOCOL/HOTSTNAME/{path_to_resource}
○ More specifically :
URL(Uniform Resource
Locator)
● Scheme:[//user:password@]domain[:port][/]path[?query][#f
ragment]
● Scheme: e.g. HTTP/HTTPS,FTP,mailto,file, data.
● Hierarchical Path: optional authority part:
○ //
○ Optional authentication
○ Hostname: domain name, IP Address
○ Path: hierarchical form path
○ NOT IDENTICAL TO THE PSHYSICAL PATH ON THE SERVER
○ Query (?): query string of non-hierarchical data
○ Fragment [#]: direction to a secondary resource: heading,
identifier of a specific element.
Domain names
Fully Qualified
Second-Level Top-Level
Host Name Label Subdomain Label Domain Name
Domain Label Domain Label
llustration of Domain Names, with Labels and (FQDN)
Hierarchy.
utexas
edu utexas.edu.

www - utexas edu www.utexas.edu.

computerstore.ut
computerstore utexas edu
exas.edu.

www.mccombs.u
www mccombs utexas edu
texas.edu.
Client-server architecture
● System composed of a client program that consumes
services provided by a server program.
● Client and Server implementations agnostic of each
other(and decoupled).
○ On the Web: Client- > application responsible for building
web request
○ Server-> application responsible of processing the client’s
request and providing a response to the client.
● Client-applications: run in Web Browsers
● Server-applications: run on Web Servers
Web browser
● Software application for retrieving and presenting information
resources available on the Web.
● Responsibilities:
● Resource retrieval( from the server)
● Navigation between resources
● Resource presentation:
○ Render HTML(or other formats) to the user
● Most known web browsers: Chrome, IE/Microsoft Edge, Firefox
● Components of a web browser: UI, layout engine, rendering
engine, Javascript interpreter, UI back-end, networking
component, data persistence component (e.g. cookies, history,
bookmarks).
○ Technology stack: HTML, CSS, JS-based frameworks
Web server
● Computer system running applications further exposed over the
Internet/Intranet by means of the TCP/IP stack:e.g. HTTP, UDP.
● Responsibilities:
○ Interpret requests(HTTP) from the Web Client, perform processing in
order to generate the response and return into the Web Client.
○ Path translation: map the path component of a URL to a local file
system resource or an internal/external program.
○ Response generation: static or dynamic content in the supported
MIME-type Formats: HTML, JSON, XML.
● Examples:
● Apache- JAVA, PHP based server applications
● Internet Information Services (IIS)- .NET based server applications
ISO-OSI Stack
● OSI: Open Systems Interconnect (OS)
● Layers
○ Physical layer
○ Data Link
○ Network
○ Transport
○ Session
○ Presentation
○ Application
ISO-OSI Stack(scheme)
OSI Stack- Physical layer
● Physical layer
○ transmission and reception of the unstructured raw bit
stream over a physical medium
○ electrical/optical, mechanical, and functional interfaces
to the physical medium, and carries the signals for all of
the higher layers. It provides:
○ Data encoding: DSP for bit and signal synchronization
○ Physical medium attachment: pins, external transceivers
○ Transmission technique: broadband(analog),
baseband(digital)
○ Physical medium transmission: optical/electrical signals
OSI Stack- Data link layer
● Data Link Layer
○ error-free transfer of data frames from one node to another
over the physical layer, allowing layers above it to assume
virtually error-free transmission over the link
○ Responsibilities:
○ Link establishment and termination
○ Frame traffic and control
○ Frame sequencing
○ Frame acknowledgment
○ Frame delimiting
○ Frame error checking
○ Media access management
OSI Stack- Network layer
● Network Layer
○ Routing frames over the network
○ Subnet traffic control: routers (network layer
intermediate systems) can instruct a sending station to
"throttle back" its frame transmission when the router's
buffer fills up
○ Frame fragmentation:
○ Logical-physical address mapping: translates logical
addresses, or names, into physical addresses
○ Subnet usage accounting: has accounting functions to
keep track of frames forwarded by subnet intermediate
systems, to produce billing information.
OSI Stack- Transport layer
● The transport layer responsibility: messages delivered error-free, in
sequence, and with no losses or duplications. Includes error detection
and recovery.
● End-to-end transmission
● Provides:
○ message segmentation: from an upper layer, message is split into
smaller units
○ message acknowledgement: reliable end-to-end message delivery
○ message traffic control: stop transmission when no buffers available
○ session multiplexing: message streams/session onto one logical link,
association of messages to sessions (-> Session Layer)
○ header info includes: control information(message start-end),
sequencing
OSI-Stack- Transport layer TCP
http://blog.pluralsight.com/networking-basics-
tcp-udp-tcpip-osi-models
OSI-Stack Transport layer-UDP
http://blog.pluralsight.com/networking-basics-
tcp-udp-tcpip-osi-models
OSI Stack- Session layer
● The session layer allows session establishment
between processes running on different stations. It
provides:
● Session establishment, maintenance and
termination: allows two application processes on
different machines to establish, use a terminate a
connection, called a session.nd
● Session support: performs the functions that allow
these processes to communicate over the network,
performing security, name recognition, logging, and
so on.
ISO-OSI Stack- Presentation layer
● Presentation layer:
○ Define formats for data to be transmitted:
○ Character code translation: for example, ASCII to
EBCDIC.
○ Data conversion: bit order, CR-CR/LF, integer-
floating point, and so on.
○ Data compression: reduces the number of bits
that need to be transmitted on the network.
○ Data encryption: encrypt data for security
purposes. For example, password encryption.
ISO-OSI Stack- Application layer
● Application layer:- accessing network services
● Resource sharing and device redirection
● Remote file access
● Remote printer access
● Inter-process communication
● Network management
● Directory services
● Electronic messaging (such as mail)
● Network virtual terminals
TCP-IP Stack
● Most common protocol on the internet.
● Provides end-to-end connectivity specifying
how data should be packetized, addressed,
transmitted, routed and received at the
destination.
● 4 layered-system:Application, Transport,
Internet, Network Interface
TCP-IP Stack/ISO OSI Stack
TCP-IP Stack- Detailed view
TCP-IP Stack- Network Interface
layer
● Network Interface Layer: the place where the
actual TCP/IP protocols from the higher
layers interface to the local network.
○ similar to the data link layer on the
ISO/OSI
○ No TCP/IP protocol running at this layer
○ Protocols for serial line connections(dial-
up): SLIP, PPP
TCP-IP Stack- Internet Layer
● Internet Layer: send packets across multiple networks
● Internet protocol:
○ Host addressing and identification: hierarchical IP system
(IPv4, IPv6)
○ Packet routing: sending datagrams( data packets) from
source to destination by forwarding them to the next
network router closer to the final destination.
■ does not distinguish between operation of the
various transport layer protocols.
■ IPv4:32 bit address- limited to 4 billion hosts
■ IPv6
TCP-IP Stack: Transport layer
● Transport Layer: establish basic data channels that
applications use for task-specific data exchange.
○ Provides process-to-process connectivity: end-to-end
services independent of the structure of the user data
and logistics of exchanging information.
○ End-to-end message transfer independent of the
underlying network, including error-control,
segmentation, flow control, congestion control and
application addressing (port numbers).
○ End-to-end message transmissions: Connection-
Oriented(TCP), Connectionless(UDP)
TCP-IP Stack- TCP segment format
TCP-IP Stack- TCP Segment Format
● Source Port, Destination Port: endpoints of the connection (
host+port=unique endpoint)
● Sequence number, Acknowledgement number: bytes in the
byte stream. Used for segmentation differentiation->
reordering, retransmitting lost segments. Ack number is set
to the next segment expected.
● Data offset/TCP header length: how many 4-byte words are
contained in the TCP header.
● Window field: how many bytes can be transmitted before an
acknowledgement is received?.
● Checksum field: provide extra reliability and security to the
TCP segment.
● Actual user data: at the end of the header.
TCP/IP Stack UDP
● UDP: used when reliable delivery and extra
overhead are not required.
● Efficiency and fast transmission
● Connectionless protocol: reliability will be
handled at the application layer
TCP/IP Stack- UDP segment format
TCP/IP Stack- TCP vs UDP
TCP-IP Stack: Application layer
● Application layer: protocols used by most
applications for providing user services or
exchanging application data over the
network.
● Examples: HTTP, FTP, SMTP, DHCP.
● Main target for application developers
HTTP

● Most used application protocol on the Web


● Request-Response protocol between client and server
○ HTTP Client: sends request message to the server
○ Server: returns the response message
○ “Pull protocol”: client pulls information from the server
○ Typically runs over TCP/IP
HTTP

● Features:
○ Stateless: current request totally
independent and unaware of previous
requests
○ Negotiation of data type/representation=>
systems independent of the transferred data
○ Designed for distributed,collaborative,
hypermedia information systems.
HTTP-Request messages
● HTTP Message Structure:
○ Message header (mandatory)
○ Blank line-> separate header and body
○ Message body (optional)
● HTTP Request message structure:
○ Request line: request-method-name request-uri HTTP-version
○ Request-method-name: predefined HTTP methods: GET,POST,
PUT,HEAD,PATCH,DELETE,OPTIONS
○ Request URI: requested resource identifier
○ HTTP versions: 1.0/1.1
○ Example: GET /index.html HTTP/1.1
○ Request headers: -key-value pairs, separated by commas
○ e.g.: Host: www.google.com
○ Authorization: Basic DFSDFSDFSDFEWTTW3243242
HTTP-Request methods
● HTTP Message Structure:
● GET: retrieve a web resource from the server
● HEAD: retrieve the header a GET request would obtain. E.g.: last-
modified-date
● POST: post/submit data to the server to be processed
○ Create a resource on the server
● PUT: ask the server to store data
○ Update a resource on the server
● DELETE: delete a resource from the server
● TRACE: ask for returning a diagnostic trace of the actions it takes
● OPTIONS: - get supported request methods by the server
● CONNECT:- tell a proxy to make a connection to another host and reply
the content.
○ Make SSL connections
HTTP Request methods- GET
● GET: retrieve a web resource from the server
○ Format: GET request-URI HTTP-version
■ (optional request headers)
■ (blank line)
■ (optional request body)
○ GET: case sensitive
○ Request-URI: resource path, must begin from the root “/” of the document
base directory
○ HTTP-Version: HTTP/1.0 or HTTP/1.1. Client negotiates the protocol to be
used for the current session. IF the server does not support HTTP/1.1, it may
inform the client in the response to use HTTP/1.0
○ Optional request headers: Accept, Accept-Language, Authorization,Keep-
Alive to negotiate with the server and ask the server to deliver the preferred
contents.
○ Optional request body: contains the query string
HTTP Request methods- POST
● POST: submit data to the server for further processing
○ Format: POST request-URI HTTP-version
■ Content-Type: mime-type
■ Content-Length: number-of-bytes
■ (other optional request headers)
■ (blank line)
■ (URL-encoded query string)- in the body
○ GET: case sensitive
○ Request-URI: resource path, must begin from the root “/” of the document
base directory
○ HTTP-Version: HTTP/1.0 or HTTP/1.1. Client negotiates the protocol to be
used for the current session. IF the server does not support HTTP/1.1, it may
inform the client in the response to use HTTP/1.0
○ Optional request headers: Accept, Accept-Language, Authorization,Keep-
Alive to negotiate with the server and ask the server to deliver the preferred
contents.
○ Optional request body: contains the query string
HTTP Response Message
● HTTP Response Message Structure:
● Status line: contains the status code
○ HTTP-version status-code reason-phrase
● Response headers
○ Content-Type
○ Content-Length
○ Keep-Alive
● Body
HTTP Status Codes
● HTTP Status Codes:
○ Informational: 1XX
■ Provisional response, only the Status-Line and optional headers
■ No required headers
■ Most important
● 100: Continue: client should continue with the request
● Succesful: 2XX
○ 200 OK: Request has succeeded.
○ 201 Created: Newly created resource can be referenced by the URI
in the entity of the response
○ 202 Accepted: processing has not completed
○ 204 No Content: Request fulfilled, but no need for returning an
entity body (e.g. PUT request)
HTTP Status Codes (2)
● Redirection: 3XX
○ Further action needs to be taken by the client to fulfill the request.
○ 301 Moved Permanently
○ 302 Found
○ 304 Not Modified
● Client Error: 4XX
○ Cases where the client made an error in bulding the request
○ 400 Bad Request: Malformed syntax. The request should be repeated with
the according modifications.
○ 401 Unauthorized: User authentication required. WWW-Authenticate
header hield with the challenge applicable to the requested resource. The
request would be repeated with an authorization header.
○ 409 Conflict: Request could not be completed due to a conflict with the
current state of the resource. Acceptable when the user would be able to
solve the conflict and re-submit the request. Most often: PUT requests.
○ 412: Precondition failed: On or more request-header validations were
validated to false.
HTTP Status Codes (3)
● Server error: 5XX
○ Server-side errors: server aware, but unable to perform
the request.
○ 500 Internal Server Error: The server encountered an
unexpected condition which prevented it from fulfilling
the request. A server should not use this status code to
return responses for scenarios like: invalid data.
○ 502 Bad Gateway: The server, acting as a proxy, received
an invalid response from an upstream server.
○ 503 Service Unavailable: overloading or maintenance on
the server
○ 504 Gateway Timeout
HTTPS- HTTP Secure
● HTTP over SSL: secure communication over a computer
network
● Classic HTTP- unencrypted: attackers can gain access to
website data and sensitive information.
● The connexion is encrypted by the TLS(Transport Layer
Security) or SSL( Secure Sockets Layer)
● Public Key Infrastructure(PKI) System.
● 2 keys are used for encrypting communication:
○ public key: distributed to anybody needing to be able to
decrypt information
○ private key: kept on the server
HTTPS(2)
● HTTPS connection mechanism:
○ When a HTTPS connection is requested,
the website sends its SSL Certificate(public
key) to the browser
○ The browser and the website initiate SSL
Handshake: generation of shared secrets
to establish a uniquely secured connection
HTTPS-SSL Negotiation handshake
1. Client Hello:- information server needs to communicate with the client using SSL: SSL version number, cipher
settings, session-specific data.
2. Server Hello: information client needs to communicate with the server using SSL: SSL version number, cipher
settings, session-specific data, Server Certificate (Public Key)
3. Authentication and Pre-Master Secret
-Client authenticates the server certificate ( Common Name/Date/Issuer)
-Client(depending on the cipher) creates the pre-master secret of the session
-Encrypts with the server’s public key and sends the encrypted pre-master secret to the server.
4. Decryption and Master Secret:
- Server uses its private key to decrypt the pre-master secret
- Both Server and Client perform steps to generate the master secret with the agreed cipher
5. Generate Session Keys:
-Both client and server use the master secret to generate the session keys ( symmetric)- encrypt/decrypt info
during SSL session.
6. Encryption with Session Key:-> message exchange to inform future messages will be encrypted.
HTTP-SSL Handshake(2)
Digital Certificate Structure(X.509 Specification)

● Serial Number: Used to uniquely identify the certificate.


● Subject: The person, or entity identified.
● Signature Algorithm: The algorithm used to create the signature.
● Signature: The actual signature to verify that it came from the issuer.
● Issuer: The entity that verified the information and issued the certificate.
● Valid-From: The date the certificate is first valid from.
● Valid-To: The expiration date.
● Key-Usage: Purpose of the public key (e.g. encipherment, signature,
certificate signing...).
● Public Key: The public key.
● Thumbprint Algorithm: The algorithm used to hash the public key
certificate.
● Thumbprint (also known as fingerprint): The hash itself, used as an
abbreviated form of the public key certificate.
State Management
● HTTP: stateless, mechanisms for state
management need to be implemented in
the web applications.
● Mechanisms:
○ On the client-side: web
browser(cookies)
○ On the server-side: web
server(sessions)
HTTP cookies
● HTTP cookies: stored in the browser. Everytime the user
requests the website, the browser sends the cookie back to
the server to notify previous activity.
● Cookie types:
○ Session cookie (“in memory”, “transient”). Deleted after
browser is closed. Have no expiration date assigned.
○ Persistent Cookie: -> expire at the specified date or after
a specific length of time.
○ Secure cookie: transmitted over an encrypted
connection.
○ HttpOnly cookie: transmitted only via HTTP(s). Cannot be
accessed via non-HTTP APIs. Avoid XSS.
○ Third-party cookie: cookies in a different domain. Used
for tracking.
HTTP Cookie-Components
● Name
● Value
● Attributes(0 or more):
○ Expiry
○ Domain
● Create a cookie: Set-Cookie: name=X;[attributes]
HTTP Cookie- use cases
● Authentication
● Personalization
● Tracking
● Session management

● Be careful to encrypt cookies over HTTP.


HTTP Cookies- Pros and cons
● Pro
○ Remove processing on the server side.
● Cons
○ Identification
○ Inconsistent state on client and server
○ Inconsistent support by devices
HTTP-Sessions
● State management done on the server.
● Key-value pairs in the SESSION object on the server.
● Stored physically: in-memory, external Database Server etc.
● Centralized state management: the user will retrieve the current
state regardless of the different locations of the previous request.
● Advantages:
○ Consistency
○ Security
● Drawbacks:
○ Scalability -> server farms scenarios
○ Performance
Web Developer Tools
● Intercepting HTTP requests: Fiddler
● UI& Client-side diagnostics: Chrome
Developer Tools, Firebug
● Building raw HTTP Requests: POSTMAN
● Testing SOAP Services: SOAP-UI

You might also like