Professional Documents
Culture Documents
The coursework asks us to program a proxy application of HTTP, with some specific demands as
optional demands. The most general demand is retransmit the content of a webpage from a web
server to a client. Furthermore there are some statistics available in server end as well. The traffic
throughput and response time are available in our application. Another optional function is adding
“Proxy” to each title of web sites. However, since the function is not very stable, we make this
optional by choosing different port number (port number larger than 8000 has this function only).
By overcoming the difficult point of the application, we use a third argument in command line in
server terminal indicating the efficiency of the application where 0 indicates lowest (but a
complete site) and from 1 then, the larger number means slower load rate (however, the web page
content may be lost if the Internet link is not fast enough thus 1 stands for fastest load rate). The
proxy provides service only after correct password and username are entered.
Requirements Analysis
This application is programmed in Linux (Ubuntu 9.10) System with C compiler (Build Essential
Version 11.7). During the communication process, there are 3 parts get involved: the client, the
proxy server and the web server. Since this program need operation in Internet, the socket is
needed. There are three sockets needed. The first one takes responsibility of listening request from
the client. The second one is used for communication between the client and the proxy server. The
third one is used for communication between the proxy server and the web server. However, since
there’s some operation during the proxy process, some changes of request and response in HTTP
header is requested. This means some operation about string using pointer, an advanced request is
try to make the algorithm efficient in order to get a quicker response time. The statistics of
throughput and response time are obtained in the process of retransmission as well.
Preliminary Design
To achieve the basic demands, there are five steps to do so. The first is that listen the request from
the client. The second is receiving the request from the client. The third is sending the request to
the web server. The fourth is used for getting response from the web server. And the last is used
for sending the response to the client. These processes are easy to achieve by socket except the
first one.
The first one is achieved by a fixed module. In order to deal with all requests at the same time,
several processes in system shall be started. First one is known as the parent process or main
process. By binding the socket in specific port and three shakes, the main process listen the
request from the client. If there’s one, there will be a new socket and a child process will be
created. In the child process, the old socket will be closed and the new socket will process all
requests. In the parent process, the new socket will be closed since they won’t be used any more
whereas the old socket will keep alive to listen to new request.
However, the key process in this program is to process the content to send and achieve. Generally
speaking, there’re two key processes. The first one is between receiving the request and sending
request. The second is between receiving the response and sending the response.
In first process, according to the definition of HTTP header in RFC 2616, we need to check the
proxy authentication of using a proxy. Some commands in HTTP should be found (Proxy
Authentication: Basic …). If there’s not such a command, the proxy server should reject to offer
service to the client by sending “HTTP/1.1 407 …” Then, the browser of client will ask user to
enter username and password. A command “Proxy-Authentication” will be append to HTTP
header afterwards. If the username and password fits perfectly, user may start to use the service.
Furthermore, for the process in response (to complement the optional function: adding Proxy in
title of webpage), the text response is expected. To achieve this, the command “Accept-Encoding”
should be deleted and then the application will receive text response as a result. Furthermore, we
need to analysis the host name and port number in order to fill in the socket address correctly. The
IP address is needed as well by querying DNS server as well.
In response process, we can get the throughput though browser and the responding time by
counting the time after sending request until receive the first response. The throughput can be
counted by adding the amount of information received. If optional function is triggered by port
number the user chosen (port number larger than 8000), the string “Proxy:” should be put into the
packets, and the “Content-Length” should be modified or deleted if there’s one.
Detailed Design
The key algorithm we making is both about string operation. The first one is adding a string or
deleting a string in a specific string. The second one is picking out the domain name of the host.
Both operations are based on pointer operation. A special bug-fix method will be involved as well
in this part.
The first algorithm is implemented by finding the location of operation with pointer variable. The
function strstr in <string.h> is the most common way to find it. The function strstr returns the
pointer of a specific string appears first time. By assigning that to a pointer variable, we can easily
get the precious location in a string. However, to store the modified header we need to allocate a
room in RAM, which is big enough for storing all information after modified. In this application,
we deal that in a while structure when we are receiving that. However, this may bring to another
problem we will discuss later. To avoid this problem we use a simplified method when we
receiving request from client. We assign a big string to store that. But it won’t work in all cases, if
the HTTP header is big enough, there might be some problems. According our statistics, the length
we assigned for that string 2048 is good enough in most cases. No matter in which way, we get the
original HTTP header after all. And we mark the start of string we want to delete and the end of
the string as well; we find the location we want to add something by strstr. Then, we can allocate
the RAM to the string finally storing all information. We state the way to delete a string in original
HTTP header first. By using memcpy, we copy the content from the beginning of HTTP header to
begin of the sentence we marked by a pointer. (We achieve this by use the later pointer minus the
previous one to obtain the value of n we are pursuing.) Then, we use memcpy again to copy the
content from the end of string to be deleted to the end of the HTTP header or the end of packet.
Adding some contents will do almost the same with a different order. We only find the place we
want to insert our content. We first copy from begin of the packet to this place, and copy our
words into the string and then, copy anything remaining.
We find the URL of host by pointer operation as well. This algorithm is based on the following
properties of URL. There is no space (‘ ‘) in URL and the “:” appears only when there’s a port
number. Using a for structure, we first check each character. If there is a space, the URL is end.
However, in case of port number, we record the content after “:” and convert them to integer. A
default port number 80 is used as default value.
We use a third argument as the efficient factor for tackling the problem of receiving packets. On
account of various reasons, when reading the content of a packet, there might not be a return value
we expect. So we use a while structure with the condition always true. This will ensure the content
display completely but will slow down the speed sharply. We use signals to end the process if
there’s no content to receive for a long time. The third argument, in fact is argument of alarm
function with expectation 0 as there’s no alarm working at all. The quicker the process terminate,
the faster this application run.
Results
We use a sample run to open baidu website with server port number 8001. This will add “Proxy:”
to title of web page.
<----------------Begin of 1------------------>
-----1----->Client IP:127.0.0.1
-----1----->Port:36966
There is no authorization
Part 2 The second request with authentication
<----------------Begin of 2------------------>
-----2----->Client IP:127.0.0.1
-----2----->Port:36967
-----2----->To:www.baidu.com(port:80)
-----2----->To:119.75.216.30
Part 3 Then the HTTP header of request has been print out
<---------------Request 2----------------->
GET http://www.baidu.com/ HTTP/1.1
Host: www.baidu.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1.6) Gecko/20091215 Ubuntu/9.10
(karmic) Firefox/3.5.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: zh-cn,zh;q=0.5
Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Proxy-Connection: keep-alive
Cookie: BAIDUID=C7D2B7BD925D538F65E0CE023C253517:FG=1
Cache-Control: max-age=0, max-age=0
Proxy-Authorization: Basic U2NodW1hY2hlcjpkYXZpZDg4OTI5
Part 4 Then, the response from web server arrive. The response time will be record. Since there's a
tag <html> in the packet, there's a hint of opening a new site.
<-----------Response 2----------->
HTTP/1.1 200 OK
Date: Sat, 02 Jan 2010 04:09:29 GMT
Server: BWS/1.0
Content-Length: 3644
Content-Type: text/html;charset=gb2312
Cache-Control: private
Expires: Sat, 02 Jan 2010 04:09:29 GMT
<----------------Begin of 3------------------>
-----3----->Client IP:127.0.0.1
-----3----->Port:36969
-----3----->To:www.baidu.com(port:80)
-----3----->To:119.75.216.30
<---------------Request 3----------------->
GET http://www.baidu.com/img/baidu_logo.gif HTTP/1.1
Host: www.baidu.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1.6) Gecko/20091215 Ubuntu/9.10
(karmic) Firefox/3.5.6
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: zh-cn,zh;q=0.5
Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Proxy-Connection: keep-alive
Referer: http://www.baidu.com/
Cookie: BAIDUID=C7D2B7BD925D538F65E0CE023C253517:FG=1
If-Modified-Since: Wed, 30 Jul 2008 10:23:00 GMT
If-None-Match: "5d1-48904104"
Proxy-Authorization: Basic U2NodW1hY2hlcjpkYXZpZDg4OTI5
Cache-Control: max-age=0
<-----------Response 3----------->
HTTP/1.1 304 Not Modified
Date: Sat, 02 Jan 2010 04:09:29 GMT
Server: Apache
ETag: "5d1-48904104"
Expires: Tue, 31 Dec 2019 04:09:29 GMT
Cache-Control: max-age=315360000
<----------------Begin of 4------------------>
-----4----->Client IP:127.0.0.1
-----4----->Port:36970
-----4----->To:gimg.baidu.com(port:80)
-----4----->To:220.181.6.68
<---------------Request 4----------------->
GET http://gimg.baidu.com/img/gs.gif HTTP/1.1
Host: gimg.baidu.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1.6) Gecko/20091215 Ubuntu/9.10
(karmic) Firefox/3.5.6
Accept: image/png,image/*;q=0.8,*/*;q=0.5
Accept-Language: zh-cn,zh;q=0.5
Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Proxy-Connection: keep-alive
Referer: http://www.baidu.com/
Cookie: BAIDUID=C7D2B7BD925D538F65E0CE023C253517:FG=1
If-Modified-Since: Fri, 11 Aug 2006 04:20:15 GMT
Proxy-Authorization: Basic U2NodW1hY2hlcjpkYXZpZDg4OTI5
Cache-Control: max-age=0
<-----------Response 4----------->
HTTP/1.1 304 Not Modified
Date: Sat, 02 Jan 2010 04:09:29 GMT
Server: Apache
Connection: close
Expires: Tue, 31 Dec 2019 04:09:29 GMT
Cache-Control: max-age=315360000
>>>>>>>>>>>>>>>>4196 Bytes have been transmitted
<----------------Begin of 5------------------>
-----5----->Client IP:127.0.0.1
-----5----->Port:36971
-----5----->To:www.baidu.com(port:80)
-----5----->To:119.75.213.61
<---------------Request 5----------------->
GET http://www.baidu.com/js/bdsug.js?v=1.1.0.3 HTTP/1.1
Host: www.baidu.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1.6) Gecko/20091215 Ubuntu/9.10
(karmic) Firefox/3.5.6
Accept: */*
Accept-Language: zh-cn,zh;q=0.5
Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Proxy-Connection: keep-alive
Referer: http://www.baidu.com/
Cookie: BAIDUID=C7D2B7BD925D538F65E0CE023C253517:FG=1
If-Modified-Since: Mon, 29 Jun 2009 09:55:00 GMT
If-None-Match: "1ff1-4a488f74"
Proxy-Authorization: Basic U2NodW1hY2hlcjpkYXZpZDg4OTI5
Cache-Control: max-age=0
<-----------Response 5----------->
HTTP/1.1 304 Not Modified
Date: Sat, 02 Jan 2010 04:09:30 GMT
Server: Apache
ETag: "1ff1-4a488f74"
<----------------Begin of 6------------------>
-----6----->Client IP:127.0.0.1
-----6----->Port:36975
-----6----->To:fxfeeds.mozilla.com(port:80)
-----6----->To:63.245.209.93
<---------------Request 6----------------->
GET http://fxfeeds.mozilla.com/en-US/firefox/headlines.xml HTTP/1.1
Host: fxfeeds.mozilla.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1.6) Gecko/20091215 Ubuntu/9.10
(karmic) Firefox/3.5.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: zh-cn,zh;q=0.5
Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Proxy-Connection: keep-alive
X-Moz: livebookmarks
Proxy-Authorization: Basic U2NodW1hY2hlcjpkYXZpZDg4OTI5
Cache-Control: max-age=0
<<<<<<<<<<<<<<<<<<Response Time of Request 6 ------ <433ms
==========A websites has been opened=========
<-----------Response 6----------->
HTTP/1.1 302 Found
Server: Apache
Content-Type: text/html; charset=iso-8859-1
Date: Sat, 02 Jan 2010 04:09:43 GMT
Location: http://fxfeeds.mozilla.com/firefox/headlines.xml
Expires: Sat, 02 Jan 2011 04:09:20 GMT
X-Cache-Info: caching
Content-Length: 232
<----------------Begin of 7------------------>
-----7----->Client IP:127.0.0.1
-----7----->Port:36977
-----7----->To:fxfeeds.mozilla.com(port:80)
-----7----->To:63.245.209.93
<---------------Request 7----------------->
GET http://fxfeeds.mozilla.com/firefox/headlines.xml HTTP/1.1
Host: fxfeeds.mozilla.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1.6) Gecko/20091215 Ubuntu/9.10
(karmic) Firefox/3.5.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: zh-cn,zh;q=0.5
Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Proxy-Connection: keep-alive
Proxy-Authorization: Basic U2NodW1hY2hlcjpkYXZpZDg4OTI5
Cache-Control: max-age=0
<-----------Response 7----------->
HTTP/1.1 302 Found
Server: Apache
Content-Type: text/html; charset=iso-8859-1
Date: Sat, 02 Jan 2010 04:09:43 GMT
Location: http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml
Expires: Sat, 02 Jan 2011 04:09:44 GMT
Accept-Ranges: bytes
X-Cache-Info: caching
Content-Length: 256
<----------------Begin of 8------------------>
-----8----->Client IP:127.0.0.1
-----8----->Port:36979
-----8----->To:newsrss.bbc.co.uk(port:80)
-----8----->To:212.58.226.143
<---------------Request 8----------------->
GET http://newsrss.bbc.co.uk/rss/newsonline_world_edition/front_page/rss.xml HTTP/1.1
Host: newsrss.bbc.co.uk
User-Agent: Mozilla/5.0 (X11; U; Linux i686; zh-CN; rv:1.9.1.6) Gecko/20091215 Ubuntu/9.10
(karmic) Firefox/3.5.6
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: zh-cn,zh;q=0.5
Accept-Charset: GB2312,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Proxy-Connection: keep-alive
Cookie: BBC-
UID=744bd10cb8a47369bcfdde87c1991b813c20c00ac0b081cfc2a936e4de9ad88c0Mozilla
%2f5%2e0%20%28X11%3b%20U%3b%20Linux%20i686%3b%20en%2dUS%3b%20rv
%3a1%2e9%2e1%2e3%29%20Gecko%2f20091020%20Ubuntu%2f9%2e10%20%28karmic
%29%20Firefox%2f3%2e5%2e3
If-Modified-Since: Wed, 30 Dec 2009 03:34:40 GMT
If-None-Match: "6ca9-47be9d18b3400"
Proxy-Authorization: Basic U2NodW1hY2hlcjpkYXZpZDg4OTI5
Cache-Control: max-age=0
<-----------Response 8----------->
HTTP/1.1 200 OK
Date: Sat, 02 Jan 2010 04:09:44 GMT
Server: Apache
Vary: Host
Last-Modified: Sat, 02 Jan 2010 04:06:00 GMT
ETag: "709e-47c269b1fda00"
Accept-Ranges: bytes
Content-Length: 28830
Cache-Control: max-age=60
Expires: Sat, 02 Jan 2010 04:10:44 GMT
Content-Type: text/xml
2: Show the websites with modified title (since server port number is larger than 8000)
Appendix: Source Codes
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<netdb.h>
#include<sys/socket.h>
#include<fcntl.h>
#include<arpa/inet.h>
#include<sys/timeb.h>
#include<unistd.h>
#include<signal.h>
void proxyProcess(int,int,int);
if(signal(SIGALRM, overRun_handler)==SIG_ERR){
perror("Can't Regist Signal Handler");
exit(1);
}
if(argc != 3){
printf("Arguments are needed: %s <port number> <connection
time>\n", argv[0]);
exit(0);
}
if (atoi(argv[1]) <= 0){ // Check value of port
printf("Invalid Port Number\n");
exit(0);
}
if (atoi(argv[2]) < 0){ // Check value of time
printf("Invalid connection time\nNo time set\n");
}
else setTime = atoi(argv[2]);
if (port>8000) mode = 1;
else mode = 0;
memset(&sin, 0, sizeof(sin));
memset(&cin, 0, sizeof(cin));
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = INADDR_ANY;
sin.sin_port = htons(port);
while(1){
csd = accept(sd, (struct sockaddr *)(&cin), &len);
requestNum++;
if ((fork() == 0) && (csd > 0)){ // Start Proxy Process in
Chile Process
if((close(sd))<0){
perror("Fail to Close (sd)");
}
printf("\n<----------------Begin of
%d------------------>\n", requestNum);
printf("-----%d----->Client IP:%s\n—---%d----->Port:
%d\n", requestNum, inet_ntoa(cin.sin_addr),
requestNum, ntohs(cin.sin_port));
proxyProcess(csd,requestNum, mode);
if((close(csd))<0){
perror("Fail to Close (csd)");
}
exit(0);
}
else if (csd < 0){
perror("Fail to Accept");
exit(0);
}
else
close(csd);
}
close(sd);
return 0;
}
// Initialize Buffers
memset(buf, 0, sizeof(buf));
memset(url1, 0, sizeof(url1));
memset(url2, 0, sizeof(url2));
// Read Request
n = read (csd, buf, 2048);
if ((strstr(buf,"Proxy-Authorization: Basic
U2NodW1hY2hlcjpkYXZpZDg4OTI5"))==0){
printf("There is no authorization\n");
char *msg = "HTTP/1.1 407 Proxy Authentication
Required\r\nCountent-Type:
text/html\r\nProxy-Authenticate: Basic
realm=\"Username and Password\"\r\n\r\n";
write(csd, msg, strlen(msg));
close(csd);
exit(0);
}
// Get IP address
web.sin_family = AF_INET;
web.sin_port = htons(iport);
web.sin_addr.s_addr = inet_addr(ip);
if((wsd = socket(PF_INET, SOCK_STREAM, 0))<0){
perror("Fail to Socket (wsd)");
exit(1);
}
if((connect(wsd, (struct sockaddr *)(&web), sizeof(web)))<0){
perror("Fail to connect (wsd)");
exit(1);
}
// Print Request
printf("\n<---------------Request %d----------------->\n%s\n",
requestNum, request);
// Send request
write(wsd, request, strlen(request));
// Manage RAM
free(request);
// Initialize count
count = 0;
// Start to time
ftime(&time1);
// End Report
printf(">>>>>>>>>>>>>>>>%ld Bytes have been transmitted\n",
thout);
printf("\n<---------End of Process %d----------
>\n",requestNum);
}