You are on page 1of 12

An Introduction to Cyber World to a Newbie.

Internet
Back in 1960s, the internet that we use today was developed by the contribution of several people. The initial idea is credited to Leonard Kleinrock, a computer science professor at University of California, Los Angeles (UCLA) after he published his first paper titled Information Flow in Large Communication Nets . Initially, the internet was not public, there was a forerunner ARPAnet (Advanced Research Project Agency Networks). ARPAnet created the TCP/IP communication standard which determines the data transfer on Internet today.

What is a Protocol?
A protocol is a standardized means of communication among machines across a network. These rules or set of established procedures determine the format and transmission of data. Protocols allow data to be taken apart for faster transmission, transmitted, and then reassembled at the destination in the correct order.

WWW, HTML, URL, HTTP


Tim Berners-Lee was the man leading the development of the World Wide Web (WWW), the defining of Hyper Text Markup Language (HTML) used to create web pages, Hyper Text Transfer Protocol (HTTP) and the Universal Resource Locators (URLs). All the developments took place around 1989 and 1991. Tim Berners-Lee is currently the Director of the World Wide Web Consortium (W3C), the group that sets technical standards for the Web.

WWW
The term WWW is an acronym or abbreviation to the World Wide Web or sometimes simply called as the Web. It is a system of all the resources (such as FTP, telnet, Usenet) and users on the Internet servers that support

specifically formatted documents called the Web Pages including hyperlinked text, audio, and video files, etc. that can be accessed and searched by browsers based on standards such as HTTP and TCP/IP. It was created in 1989 by the UK physicist Tim Berners-Lee while working at the European Particle Physics Laboratory (called CERN) in Switzerland. World Wide Web consists of all the public Web sites connected to the Internet worldwide, including the client devices (such as computers and cell phones) that access Web content. The WWW is just one of many applications of the Internet and computer networks. A broader definition from, the World Wide Web Consortium or W3C (the organization founded by Tim Berners-Lee): "The World Wide Web is the universe of network-accessible information, an embodiment of human knowledge." There are several applications called Web browsers that make it easy to access the World Wide Web. The World Web is based on these technologies:

HTML - Hypertext Markup Language HTTP - Hypertext Transfer Protocol Web browsers and web servers

HTML
HTML stands for HyperText Markup Language also known as the mother tongue of the browser, the authoring language used to create documents on the World Wide Web. Hence, it is the publishing language of the World Wide Web. HTML is similar to SGML (Standard Generalized Markup Language), although it is not a strict subset. HTML is a language, which makes it possible to present information (e.g. scientific research) on the Internet. Developed by scientist Tim Berners-Lee in 1990, HTML is the "hidden" code that helps us communicate with others on the World Wide Web (WWW). The purpose was to make it easier for scientists at different universities to gain access to each other's research documents. The project became a

bigger success than Tim Berners-Lee had ever imagined. By inventing HTML he laid the foundation for the web as it is known today.

H-T-M-L

Hyper is the opposite of linear. Old-fashioned computer programs


were necessarily linear - that is, they had a specific order. But with a "hyper" language such as HTML, the user can go anywhere on the web page at any time. Text is just what you're looking at now - English characters used to make up ordinary words. Mark-up- HTML defines the structure and layout of a Web document by using a variety of tags and attributes. The correct structure for an HTML document starts with <HTML><HEAD> (enter here what document is about) <BODY> and ends with </BODY></HTML>. All the information one would like to include in the Web page fits in between the <BODY> and </BODY> tags. There are hundreds of other tags used to format and layout the information in a Web page. Tags are also used to specify hypertext links. These allow Web developers to direct users to other Web pages with only a click of the mouse on either an image or word(s). Language is just that. HTML is the language that computers read in order to understand web pages.

The first version of HTML was described by Tim Berners-Lee in late 1991. For its first five years (1990-1995), HTML went through a number of revisions and experienced a number of extensions, primarily hosted first at CERN, and then at the IETF. With the creation of the World Wide Web Consortium (W3C), HTML's development changed venue again. HTML is a formal recommendation by the W3C and is generally followed to by the major browsers like Microsoft's Internet Explorer. A first abortive attempt at extending HTML in 1995 known as HTML 3.0 then made way to a more pragmatic approach known as HTML 3.2, which was completed in 1997. HTML4 followed, reaching completion in 1998.

The current version of HTML is HTML 4.0. Significant features in HTML 4 are sometimes described in general as dynamic HTML. What is sometimes referred to as HTML 5 is an extensible form of HTML called Extensible Hypertext Markup Language (XHTML). It is the newest specification for HTML, and many browsers are going to start supporting it in the future. What we see when we view a page on the Internet is the browser's interpretation of HTML. To see the HTML code of a page on the Internet, simply right-click on the browser and choose "View Source Code".

HTTP
Http is a protocol or a language or a medium in which the information is passed back and forth between the web servers and the clients. Http allows transmitting and receiving of information across the internet If the website is communicating with the browser with http then it is likely to be communicating with regular unsecure http method and any one can snoop on the computers conversation with the website. All the user information is contained in the HTTP headers, cookies and query parameters

HTTPS
Https (= http +s) is a URI Scheme identical in syntax of the http scheme where s stands for secure. It is a simple layering of http over SSL/TSL protocols to protect the traffic, thus adding security capabilities to the standard http communications. It provides authentication of the website and the associated web server that one is dealing with. It protects the user from Man-in-the-middle-attacks by providing:

Bidirectional encryption of information between the client and the server thus protecting the spying and tampering of the data, or the forging of communication.

Ensuring that the communication between the user and the website is not forged by a third person or an imposter. TTPS is especially important over unencrypted networks (such as Wi-

Fi), as anyone on the same local network can "packet sniff" and discover sensitive information. In addition, many free to use and even paid for WLAN networks do packet injection for serving their own advertisements on webpages or just for tricks, however this can be exploited maliciously e.g. by injecting malware and spying on users. Whenever a website is loaded in http instead of https the use information and the session gets exposed. Therefore, it becomes mandatory to check for https before filling up and submission of the information to the server. Another example where HTTPS is important is over Tor Browser bundles, connections over Tor (anonymity network), as malicious Tor nodes can damage or alter the contents passing through them in an insecure manner and inject malware into the connection. It is only due to the security reasons Tor project started the development of HTTPS everywhere, which is now included in the Tor Browser Bundle. Https signals the browser to use an added encryption layer of SSL/TSL to protect the traffic. A client can find out by examining the servers certificate whether the server is secure or not.

A Stark contrast between HTTP and HTTPS


Http

HTTP URLs begin with http:// operates on port number 80 by default it is vulnerable, insecure and is subjected to man-in-middle and spying attacks It is faster than the https. When large amount of data are processed over a port performance difference is evident works on application layer

Https

HTTPS URLs begin with https:// use port 443 by default, It is secured over the internet connection and is not subjected to man-in-the-middle attacks as all the information gets encrypted before being sent to the server. https is not a separate protocol but ordinary http over encrypted SSL/TSL (SSL comes in 2 options- mutual and single) Works on the network layer.

The web server has to be prepared to accept https connections.

Note:There is a sophisticated type of man-in-the-middle attack called SSL stripping attack which was presented at the Blackhat Conference 2009. This type of attack overthrows the security provided by HTTPS by changing the https: link into an http: link, taking advantage of the fact that few internet users actually type "https" into their browser interface: they get to a secure site by clicking on a link, and thus are deceived into considering that they are using Secured Http when in fact they are using the normal HTTP. The attacker then communicates in clear with the client. This encouraged the development of a countermeasure in HTTP called HTTP Strict Transport Security.

Web Server
The main function of a web server is to deliver web pages on the request to clients. This means delivery of HTML documents and any additional content that may be included by a document, such as images, style sheets and scripts. Not all Internet servers are part of the World Wide Web.

Web Browser
A software application which is the gateway to the internet, installed on the computer itself. It is used to locate, retrieve and also display content on the World Wide Web, including Web pages, images, video and other files. As a client/server model, the browser is the client running on a computer that contacts the Web server and requests information. The Web server sends the information back to the Web browser which displays the results on the computer or other internet-enabled device that supports a browser. A web browser communicates with a web server using the http protocol to download the pages requested by the user, usually by clicking on a hyperlink. A browser can translate HTML, the language used to create web pages, into the content displayed in the browser window. Popular web browsers include Google Chrome, Mozilla Firefox, Opera, and Internet Explorer. Web sites and Web browsing exploded in popularity during the mid-1990s.

URL
A URL is an abbreviation or acronym of Uniform Resource Locator (URL.). It was developed by Tim Berners-Lee in 1994 and the Internet Engineering Task Force (IETF) URI working group. It is a reference to documents and other resources on some machine on the network on the World Wide Web. In other words, it is the global address or unique address for a file that is accessible on the Internet. It is also sometimes referred to as a link. Such a file might be any Web (HTML) page other than the home page, an image file, or a program such as a common gateway interface application or Java applet.

It is in the form of formatted text string used by Web browsers, email clients and other software to identify a network resource on the Internet. On the Web (which uses the Hypertext Transfer Protocol, or HTTP), an example of a URL is:
http://www.example.com/abc/xyz.txt

It specifies the use of a HTTP (Web browser) application, a unique computer named www.example.com, and the location of a text file or page to be accessed on that computer whose pathname is /abc/xyz.txt. A URL for a particular image on a Web site might look like this:
http://work.example.com/assets/images/pic.gif

A URL for a file meant to be downloaded using the File Transfer Protocol (FTP) would require that the "ftp" protocol be specified like this hypothetical URL:
http://www.example.com/widgets/tool.ps

The example uses the Hypertext Transfer Protocol (HTTP), which is typically used to serve up hypertext documents. This is how a computer locates the web page that you are trying to find. URLs also can point to other resources on the network, such as database queries and command output. Network resources are files that can be plain Web pages, other text documents, graphics, or programs. As stated earlier, a URL is a formatted string which consist of three parts (substrings): 1. Network protocol 2. Host name or address 3. File or resource location These substrings are separated by special characters as follows: protocol :// host / location

URL Protocol
The 'protocol' substring defines a network protocol to be used to access a resource. These strings are short names followed by the three characters '://' (a simple naming convention to denote a protocol definition). Typical URL protocols include http://, ftp://, and mailto://. It indicates what protocol to be used to fetch the resource that identifies a specific computer on the Internet, For example, the two URLs below point to two different files at the domain example.com. The first specifies an executable file that should be fetched using the FTP protocol; the second specifies a Web page that should be fetched using the HTTP protocol: ftp://www.example.com/stuff.exe http://www.example.com/index.html

URL Host
The 'host' substring identifies a computer or other network device. Hosts come from standard Internet databases such as DNS and can be names or can specify the IP address or the domain name where the resource is located. The resource name is the complete address to the resource. The format of the resource name depends entirely on the protocol used, but for many protocols, including HTTP, the resource name contains one or more of the following components:

Host Name The name of the machine on which the resource lives. Filename The pathname to the file on the machine. Port Number The port number to which to connect (typically optional).

Reference

A reference to a named anchor within a resource that usually identifies a specific location within a file (typically optional).

URL Location
The 'location' substring contains a path to one specific network resource on the host. Resources are normally located in a host directory or folder. For example, /bin/accessibleobject/build-url.htm is the location of a Web page including two subdirectories and the file name. When the location element is omitted such as in http://work.example.com/, the URL conventionally points to the root directory of the host and often a home page (like 'index.html'). In simple terms, it is a pathname, a hierarchical description that specifies the location of a file in that computer. An example of a URL is: http://www.example.com/index.html . In this example URL, example.com is called the domain name. The "index.html" refers to the specific page.

Note: - The protocol identifier and the resource name are separated by a
colon and two forward slashes. HTTP is just one of many different protocols used to access different types of resources on the net. Other protocols include File Transfer Protocol (FTP), Gopher, File, and News. For many protocols, the host name and the filename are required, while the port number and reference are optional. For example, the resource name for an HTTP URL must specify a server on the network (Host Name) and the path to the document on that machine (Filename); it also can specify a port number and a reference.

Absolute vs. Relative URLs


Full URLs featuring all three substrings are called absolute URLs. In some cases such as within Web pages, URLs can contain only the one location element. These are called relative URLs. Relative URLs are used for efficiency by Web servers and a few other programs when they already know the correct URL protocol and host.

URI
The generic term for all types of names and addresses that refer to objects on the World Wide Web. The term "Web address" is a synonym for a URL that uses the HTTP / HTTPS protocol. A URL is a type of URI Uniform Resource Identifier, formerly called Universal Resource Identifier. The URL format is specified in RFC 1738 Uniform Resource Locators (URL).

Institutions on the Web


The chart below refers to the type of institutions that people may come across while accessing the Internet. The terminal portion of the host name defines the country in which the host resides. For example, http://www.bbc.co.uk is the web address for the commercial business (.co) called BBC residing in the United Kingdom (.uk).

Extensions
.com .biz .net .edu .org .gov .mil .int .info .museum .name .pro .aero .coop .jobs .mobi commercial institution commercial institution commercial institution educational institution not-for-profit organization government institution military international institution unrestricted use museums names of individuals lawyers, accountants, and doctors aeronautical industry cooperative organizations job advertisements mobile-device compatible sites

You might also like