You are on page 1of 9

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO.

2, APRIL 2012

789

RIHT: A Novel Hybrid IP Traceback Scheme


Ming-Hour Yang and Ming-Chien Yang
AbstractBecause the Internet has been widely applied in various elds, more and more network security issues emerge and catch peoples attention. However, adversaries often hide themselves by spoong their own IP addresses and then launch attacks. For this reason, researchers have proposed a lot of traceback schemes to trace the source of these attacks. Some use only one packet in their packet logging schemes to achieve IP tracking. Others combine packet marking with packet logging and therefore create hybrid IP traceback schemes demanding less storage but requiring a longer search. In this paper, we propose a new hybrid IP traceback scheme with efcient packet logging aiming to have a xed storage requirement for each router (under 320 KB, according to CAIDAs skitter data set) in packet logging without the need to refresh the logged tracking information and to achieve zero false positive and false negative rates in attack-path reconstruction. In addition, we use a packets marking eld to censor attack trafc on its upstream routers. Lastly, we simulate and analyze our scheme, in comparison with other related research, in the following aspects: storage requirement, computation, and accuracy.

himself from tracing. Therefore, IP spoong makes hosts hard to defend against a DDoS attack. For these reasons, developing a mechanism to locate the real source of impersonation attacks has become an important issue nowadays. For tracing the real source of ooding-based attack packets, Burch and Cheswick [6] propose a link test scheme using the UDP chargen service to generate an extra load to upstream links. The extra load may compete against the attack packets and perturb the attack trafc, so that we can nd the upstream router through which the attack trafc passes. Bellovin et al. [5] propose an iTrack scheme, which generates an ICMP packet with forward and backward links of the router to leverage the triggering packet. The victim host collects the ICMP messages to reconstruct the attack path. Because previous schemes need extra packets to trace the origin of attack packets, packet marking approaches are introduced to mark the router or path information on the triggering packets. Packet marking can be put into two categories, determinIndex TermsDoS/DDoS attack, hybrid IP traceback, IP spoong, packet logging, packet marking. istic packet marking (DPM) and probabilistic packet marking (PPM). Belenky and Ansari [3], [4] propose DPM traceback schemes to mark a border routers IP address on the passing I. INTRODUCTION packets. However, IP headers identication eld is not enough http://ieeexploreprojects.blogspot.com to store the full IP address. For this reason, the border router ITH the rapid growth of the Internet, various internet divides its IP into several segments and computes the digest of applications are developed for different kinds of users. its IP. Then it randomly chooses a segment and the digest to Due to the decreasing cost of Internet access and its increasing mark on its passing packets. When the destination host receives availability from a plethora of devices and applications, the im- enough packets, it can use the digest to assemble the different pact of attacks becomes more signicant. To disrupt the service segments. On the other hand, Savage et al. [18] propose a PPM of a server, the sophisticated attackers may launch a distributed scheme with edge sampling which is called FMS. Song and denial of service (DDoS) attack. Based on the number of packets Perrig [20] propose the AMS scheme. Yaar et al. [24] propose to deny the service of a server, we can categorize DDoS attacks the FIT scheme. Al-Duwari and Govindarasu [1] propose the into ooding-based attacks and software exploit attacks [10]. probabilistic pipelined packet marking (PPPM) scheme. Gong The major signature of ooding-based attacks is a huge amount and Sarac [26] propose a practical packet marking scheme. of forged source packets to exhaust a victims limited resources. These probability-based schemes require routers to mark partial Another type of DoS attack, software exploit attacks, attacks path information on the packets which pass through them with a host using the hosts vulnerabilities with few packets (e.g., a probability. That is to say, if a victim collects enough marked Teardrop attack and LAND attack). Since most edge routers do packets, it can reconstruct the full attack path. not check the origins address of a packet, core routers have difSince ooding-based traceback schemes need to collect a culties in recognizing the source of packets. The source IP ad- large amount of attack packets to nd the origin of attacks, these dress in a packet can be spoofed when an attacker wants to hide schemes are not suitable for tracing the origins of software exploit attacks. Manuscript received June 11, 2011; revised August 24, 2011; acMost current tracing schemes that are designed for software cepted September 12, 2011. Date of current version March 08, 2012. exploits can be categorized into three groups: single packet, This work was supported by the National Science Council under Grants packet logging [9], [19], and hybrid IP traceback [1], [8], [9], NSC-100-2218-E-033-010. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Yong Guan. [15], [16], [25]. The basic idea of packet logging is to log a M.-H. Yang is with the Department of Information and Computer Engipackets information on routers. Huffman codes [8], Modulo/ neering, Chung Yuan Christian University, Taiwan (e-mail: mhyang@cycu. Reverse modulo Technique (MRT) [15] and MOdulo/REverse edu.tw). M.-C. Yang is with the Department of Information Application, Aletheia Unimodulo (MORE) [16] use interface numbers of routers, instead versity, Taiwan (e-mail: mcyang@mail.au.edu.tw). of partial IP or link information, to mark a packets route inforColor versions of one or more of the gures in this paper are available online mation. Each of these methods marks routers interface numat http://ieeexplore.ieee.org. bers on a packets IP header along a route. However, a packets Digital Object Identier 10.1109/TIFS.2011.2169960

1556-6013/$31.00 2012 IEEE

790

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 2, APRIL 2012

IP header has rather limited space for marking and therefore cannot always afford to record the full route information. So, they integrate packet logging into their marking schemes by allowing a packets marking eld temporarily logged on routers. We nd these tracing methods still require high storage on logged routers. Also, their schemes cannot avoid a false positive problem because their packet digests in each log table may have collision, and their schemes even have false negative problem when routers refresh logged data. Apart from these, we nd their exhaustive searching quite inefcient in path reconstruction. For these reasons, we propose a traceback scheme that marks routers interface numbers and integrates packet logging with a hash table (RIHT) to deal with these logging and marking issues in IP traceback. RIHT is a hybrid IP traceback scheme designed to achieve the following properties: 1) Our storage requirement for an arbitrary router is bounded above by the number of paths to the router, and thus every router does not need to refresh logged tracking information. 2) Our scheme achieves zero false positive and false negative rates in attack-path reconstruction. 3) We have higher efciency in path reconstruction. 4) Our scheme can censor attack trafc. In Section II, we survey related research on Huffman codes, MRT, and MORE schemes. Section III details our traceback scheme. Section IV compares our scheme with existing hybrid IP traceback approaches. Our conclusion is drawn in Section V.

router is connected to another router. The interface numbers of are between 0 and - . For discussion, we a router (or if there is no ambiguity) the upstream denote by interface number of a router in a route . In what follows, we use routes and paths interchangeably. into the marking In the marking process, each router puts is by xed-length eld. Perhaps the simplest way to encode coding [8]. However, such an approach does not use a packets is not a power of two. Choi marking eld efciently if and Dai [8] propose a marking scheme using Huffman coding to reduce the bits required for marking on a packet. It encodes by Huffman coding according to the trafc of each interface. Their analysis shows their scheme has better performance when the trafc distribution for each interface is unequal. Malliga and Tamilarasi propose two traceback schemes, namely MRT [15] and MORE [16]. While MRT uses a 32-bit marking eld, MORE uses a 16-bit parts. They marking eld and separates a log table into use mathematical methods to mark the marking elds. In their marking eld marking schemes, the new marking eld is computed by the routers to which a packet is forwarded. In their path reconstruction, the old marking eld marking eld is computed by the routers to which a packet is traced back; the upstream interface number marking eld is also computed where % is the modulo operation, and the packet is sent back to the upstream router along the obtained upstream interface. According to the test reII. RELATED WORK sults in MRT and MORE, the average bits used for marking are http://ieeexploreprojects.blogspot.com Most of current single packet traceback schemes tend to log fewer than those in Huffman coding. packets information on routers. For instance, Snoeren et al. Fig. 1 illustrates the marking process of each traceback [19] propose a system SPIE to digest the unchanged parts of scheme which marks interface numbers of routers. Suppose that and then sequena packet and used bloom lter to log the digest. However, this a packet is delivered from Host to , and then marked on scheme requires large storage space and has a false positive tially. The marking eld is initialized on and . As we can see in Fig. 1, receives s packets problem in the bloom lter. For this reason, Zhang and Guan receives s packets [25] propose TOPO to improve the efciency and precision from the upstream interface 1 and and of SPIE, but TOPO still needs large storage capacity and from the upstream interface 5. In Huffman codes, and , respecinevitably has a false positive problem because of the bloom encode the interface numbers 1 and 5 as lter. The hybrid IP traceback schemes are introduced to miti- tively (see the grey cells in Fig. 1). Reversals of codewords, and , are appended into the marking eld. In path gate the storage problem of logging-based traceback schemes. i.e., and search the reversals of codewords Gong and Sarac [9] propose a hybrid IP traceback scheme reconstruction, computes called Hybrid IP Traceback (HIT) combining packet marking to nd the upstream routers. In MRT and MORE, . And and packet logging. HIT uses packet marking to reduce the the new marking eld . number of routers required for logging. Other researchers have computes the new marking eld computes the upstream interface proposed new schemes to further reduce the storage require- In path reconstruction, and the old marking eld ment for router logging and to decrease the number of routers number computes the upstream interface required for logging, e.g., Huffman codes [8], Modulo/Reverse is and the old marking eld is modulo Technique (MRT) [15] and MOdulo/REverse modulo . In doing so, the two schemes are able to (MORE) [16]. . Since these schemes use interface numbers of trace back to source router Even though the marking eld of packet in Huffman codes, routers for marking, they assume a router set comprising routers in MRT, and MORE can store a path of longer length than in the a network and require all the routers support the respective xed-length coding, the marking eld may be full before the traceback schemes. Also, they use the degree of a router as packet reaches its destination. In such a situation, they need to a parameter in their marking schemes where the degree is log the packets information on the routers that fail to mark on the number of interfaces of the router, not including ports the marking eld. These routers then pair the packet digest with to denote the the marking eld, and then they log the pair into a log table. connected to local networks. Here we use degree of a router . Besides, these schemes need to maintain After logging, the routers clear the marking eld and repeat the an interface table on each router in advance. This table maps marking process. When a router needs to recover the marking a unique number to each interface of a router along which the eld of a request packet using its log table, it computes the digest

YANG AND YANG: RIHT: A NOVEL HYBRID IP TRACEBACK SCHEME

791

Fig. 1. Example of traceback schemes that marks router interfaces.

of the request packet and searches the log table using exhaustive Fig. 2. Network topology. search. It could recover the marking eld by the above steps. But there are the following two problems in the Huffman codes: MRT and MOREs schemes. First, after logging, if the marking eld of the packet is still 0 on the adjacent downstream router, it will be identied as a logged router for the packet while tracing back. Then it will fail to nd the origin. Second, since the digests in a log table might have a collision, it causes the false positive problem during the path reconstruction. The storage requirement is proportional to the number of Fig. 3. Fields of an IP packet. We use the gray elds as marking eld in RIHT. logged packets. Unfortunately, in the ooding-based attack, a huge amount of attack packets will log on the same router. Thus, it demands a high storage requirement on the logged router. In fact, the chance of segmented packets has been getting lower Moreover, while reconstructing a path, a logged router for a and lower from 0.25% to 0.06%, according to Stocia et al. [21] packet needs to search the digests in the log table using exhaus- and John et al. [11]. The two pieces of research also help support tive search in order to nd the old marking eld. The exhaustive our argument about the effect of MTU usage in TCP. By using search is not efcient when the log table is big. MTU, there is no big fragment problem raised in OpSec [27]. Due to the above problems in the Huffman codes, MRT and John et al. [12] also point out that over 60% of fragmented MORE schemes, we propose a traceback scheme that marks packets are attacking packets. Therefore, if attackers try to use http://ieeexploreprojects.blogspot.com routers interface numbers and integrates packet logging with Encapsulating Security Payload (ESP) packets to evade IDS, a hash table (RIHT). RIHT has a lower storage requirement and their randomly generated ESP packets can never be decrypted better precision and efciency than Huffman codes and MRT. at a victims site because of the lack of proper shared keys. In III. RIHT Like MRT and MORE, RIHT marks interface numbers of routers on packets so as to trace the path of packets. Since the marking eld on each packet is limited, our packet-marking scheme may need to log the marking eld into a hash table and store the table index on the packet. We repeat this marking/logging process until the packet reaches its destination. After that, we can reverse such process to trace back to the origin of attack packets. A. Network Topology and Preliminaries As the network topology shows in Fig. 2, a router can be connected to a local network or other routers, or even both. A border router receives packets from its local network. A core router reserves as a ceives packets from other routers. For example, border router when it receives packets from Host. However, it becomes a core router when receiving packets from . The assumptions of our scheme are as follows. 1) A router creates an interface table and numbers the up- in advance. stream interfaces from 0 to 2) A router knows whether a packet comes from a router or a local network. 3) Such a traceback scheme is viable on every router. 4) The trafc route and network topology may be changed, but not often. If we use the identication eld to mark a packet, it can lead to identication number collision in the reassembling process. such a case, the adversaries can only generate a large volume of forged ESP packets to attack a host, consuming the victims bandwidth and computation resources. If we mark the ESP packets with a low probability, the marked packets are enough for us to trace the attackers source, and the unmarked segmented ESP packets are still able to assemble at the destination host. In some cases, adversaries may compromise a node in the target network rst. Then they can use ESP packets in their software exploit, e.g., Teardrop attack and LAND attack, which tend to consume destination hosts buffer and computation resources. If we overwrite the fragment eld, the attackers are not able to launch Teardrop attacks to deny the service at a victims site. If we overwrite both the fragment eld and the fragment ag, then adversaries can no longer keep a victim waiting for the last fragmented segment. As mentioned above, the use of the fragment and the identication elds will not affect most legitimate packets. Besides, fragmentation is commonly used for IDS evasion. Thus, when we overwrite these two elds in our traceback scheme, we avoid attackers using fragmented packets to evade IDS. For this reason, we use an IP headers identication eld, ag eld, and fragment offset eld as a 32-bit marking eld, shown in Fig. 3. Notations in this paper are shown in Table I. B. Marking and Logging Scheme When a border router receives a packet from its local network, it sets the packets marking eld as zero and forwards the packet to the next core router. As shown in Fig. 4, when

792

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 2, APRIL 2012

TABLE I NOTATIONS

Fig. 5 exemplies how we mark and log an 8-bit marking eld receives a packet with . It comin our scheme. with and checks whether is putes overow or not. The result shows , not greater marks the packet than 255 and therefore not overow, so receives the packet and comand forwards it to next router. . Here, , greater than 255, and putes needs to log the packet. First, it takes therefore overow, so and as a pair and logs it into a hash table and gets its index. Then, computes . , not overow, so forwards it to the This time receives the packet and computes with next router. . The , not overow, so forwards the packet to the next router. C. Path Reconstruction

When a victim is under attack, it sends to the upstream router a reconstruction request, which includes the attack here. When a router packets marking eld, termed receives a reconstruction request, it tries to nd the attack packets upstream router. Depicted in Fig. 6, rstly it computes . If , which means this packet came from an upstream router along the , the requested router then restores the upstream interface marking eld to its premarking status. The router computes , so that we can get the , i.e., here. We packets upstream routers http://ieeexploreprojects.blogspot.com with and send the then replace the requests request to the upstream router. However, if , it means either the attack packets marking eld and its upstream interface number have been logged on the requested router, or the requested router itself is the source router. The requested router , so that we can computes decide whether the requested router is the source router or not. If index is not zero, meaning this requested router has logged and nds this packet, the router then uses index to access and . to replace the requests and Next, we use then send the request to the upstream router. However, if index is zero, this requested router is the source router, and the path reconstruction is done. We use an example in Fig. 7 to explain how we reconstruct receives a rethe path by using the packets marking eld. gets quest with Fig. 4. Marking and logging scheme. , and . Then it replaces with , i.e., in the request now becomes 32. Next, sends the modied request to . When a core router receives a packet, it computes receives the request with , it computes . If is not overow, and index, respectively, as shown in Fig. 7. The results show with and forand and the packet the core router overwrites is overow, has been previously logged in s hash table. uses the wards the packet to next core router. If and . That is, it needs to index to access the table and nds and . the core router must compute rst and uses a quadratic probing algo- Subsequently, uses 242 to overwrite in the request and in . If and rithm to search and then sends the request to its upstream router . After comare not found there, the core router inserts them as a pair into putation, gets , and uses and sends the modied the table. Then, it gets their index in the table and computes 60 to overwrite the requests . Last, it overwrites , , , and request to the upstream router. Note that with and forwards the packet to the next router. are not involved in this path reconstruction.

YANG AND YANG: RIHT: A NOVEL HYBRID IP TRACEBACK SCHEME

793

Fig. 5. Example of RIHTs marking and logging.

IV. PERFORMANCE EVALUATION AND ANALYSIS This section analyzes the storage requirement, precision, and performance of RIHT. We nd the Huffman codes logging scheme is quite similar to that in MRT and MORE, but the Huffman codes marking performance is limited [15]. For this reason, we only compare our proposed scheme with MRT and MORE. In the following simulations, the environment consists of a PC with Intel P4 930 3 GHz, 2 G RAM and FreeBSD 6.2.

http://ieeexploreprojects.blogspot.com A. Computation Analysis


In the following, we compare the computing time of logging and path reconstruction in RIHT with that in MRT and MORE. Since RIHT uses a hash table to log, we inevitably have to face a hash tables collision problem. In RIHT, the open addressing [13] method is used to solve this problem. In the open addressing method, when a new entry has to be inserted, the slots are examined, starting with the hashed-to slot and proceeding in some probe sequence, until an unoccupied slot is found. When searching for an entry, the slots are scanned in the same sequence, until either the target record is found or an unused slot is found. Furthermore, to minimize the impact of the collision problem on our scheme, we adopt the quadratic probing [13] as the probe sequence because it requires only light computation and is proved effective when we try to avoid clustering problem. When we deal with a collision problem, we have to take into consideration a hash tables load factor , which directly affects the number of collisions. However, the calculation results of collision times may vary because we have two situations, successful search and unsuccessful search [13], when logging. We explain the two situations and their relations with collision times as follows. Unsuccessful search means that an entry has not been logged in a hash table and therefore is to be inserted into an empty slot. A probe is performed each time collision occurs. The expected number of probes in unsuccessful search using open addressing is at most , assuming uniform hashing. Successful search means an entry has been logged in a hash table. The expected number of probes in a successful search using open addressing is at most , assuming uniform hashing.

Fig. 6. Path reconstruction scheme.

D. RIHT Extension As for the partial deployment issues in our traceback scheme, each router only needs to know its upstream router which complies with our scheme. Then, the two routers can use a tunnel for direct communication between them. It means if the adjacent router does not support our traceback, we will not receive any positive feedback and will have to query the next one (more than one-hop away) [14]. On the other hand, if an attack packet reaches a NAT server before any routers that support our traceback scheme, we can only trace its source to the NAT server. That is to say, we can only nd the attacks LAN, which, however, is sufcient to locate the origin of an attack. Also, the modication of a routers port numbers may lower the accuracy of our scheme. In this case, we can extend our path reconstruction scheme into a two-layer approach to get around this problem. First, each ISP needs to run our traceback scheme individually. Since every ISP is well aware of the port-number modication, they can exactly identify an ASs incoming and outgoing border routers which a packet goes through. Second, the victim site needs to run our scheme to query a traceback server in an AS in order to reconstruct an attack path. With this extension of our scheme, we can guarantee the high accuracy of this approach.

794

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 2, APRIL 2012

Fig. 7. Example of RIHTs path reconstruction.

http://ieeexploreprojects.blogspot.com

Fig. 8. Expected number of probes. Fig. 9. Computing time of hash functions with different input length.

The expected numbers of probes in the two situations are illustrated in Fig. 8. We can see that if the load factor is at most 0.5, the expected numbers of probes in the two situations are both at most 2 and thus the performance is very good. Accordingly, we would allocate enough storage space for the hash table of each router in RIHT so that the load factor of each hash table is at most 0.5. The size of a hash table is discussed in Section IV-B. Therefore, in the following discussion, we assume that the load factor of each hash table is at most 0.5. We rst compare the logging time of RIHT with that of MRT and MORE as follows. We recall that, when logging, a 32-bit marking eld is used as an input for a hash function in RIHT. Therefore, the length of the hash functions input is 4 bytes in our scheme, whereas it is 20 bytes in MRT and MORE. To understand how a hash functions computing time is affected by its input length, we have an experiment. We have 4-byte and 20-byte inputs into hash functions MurmurHash2 [2], FNVHash [17], and MD5 [22] to get their computing time. For each hash function with a different input length, we randomly choose 10 inputs and then sum up all the hashing time required, as depicted in Fig. 9. As we can see in Fig. 9, a longer input has a longer computing time for each of the three hash functions. We also nd that FNVHash is sensitive to input length, whereas MD5 is not. Then

we simulate the RIHT, MRT, and MOREs logging schemes on a router using a computer program. In the simulation of the RIHTs logging scheme, initially, we allocate an empty hash table of size 2 . Then, we randomly choose 2 32-bit marking elds. Note that each marking eld corresponds to a path to the router. Then each of the 2 marking elds is repeated times as inputs. This means that we have 2 packets from 2 paths, each having packets. Then, the overall computing time is recorded. The simulation is repeated 100 times and we calculate the average computing time of the 100 simulations. As for the simulation of the MRTs (or MOREs) logging scheme, we also randomly choose 2 32-bit marking elds for MRT (or 2 16-bit marking elds for MORE). After that, each of the 2 marking elds is repeated times as inputs. Then, the overall computing time is recorded. The simulation is also repeated 100 times and the average computing time is calculated. First, MurmurHash2 is adopted to simulate the logging schemes of RIHT, MRT, and MORE. The comparison of their computing time is illustrated in Fig. 10. The result shows that the logging time of RIHT is apparently shorter than that of the other two schemes. The second simulation adopts FNVHash to calculate the three schemes logging

YANG AND YANG: RIHT: A NOVEL HYBRID IP TRACEBACK SCHEME

795

associated with . But in RIHT, we only need to get index stored on the request packets marking eld, and then with index we can just obtain the logged data from the hash table without any search. Since we do not need to spend time on search, the path reconstruction in our scheme is obviously faster than that in MRT and MORE. B. Storage Requirement Our scheme maintains a hash table and an interface table on a router, while MRT and MORE maintain log tables and an interface table on a router. Since the storage requirement of an interface table is negligible, we leave it out of our storage requireFig. 10. Computing time of logging schemes using MurmurHash2. ment analysis. In RIHT, the size of a hash table decides how many paths can be logged on a router. For two arbitrary packets in RIHT, they take the same path to a router if and only if they have the same marking led on the router. Thus, our scheme regards the marking eld of a packet as one path to a router. For discussion, we say that a path to a router needs to be logged on if the marking eld of every packet taking this path needs to be logged on . A hash tables load factor , where is the number of logged paths in a hash table. As the analysis in Section IV-A suggests, we should keep the load factor below 0.5. Therefore, if the number of paths which need to be logged on a router is , then the size of the hash table on the router should be set as . Furthermore, every entry in a hash table includes one 32-bit eld and one 8-bit eld; hence Fig. 11. Computing time of logging schemes using FNVHash. each entry uses 40 bits. Thus, we have the following theorem. Theorem 1: Let http://ieeexploreprojects.blogspot.combe the number of paths needing to be logged on a router in RIHT. Then, the required size of the hash table on is at most and the storage requirement for is at most bits. On the other hand, one entry in the log table of MRT contains a 32-bit digest and a 32-bit marking eld. In MORE, one entry in a log table contains a 32-bit digest and a 16-bit marking eld. Thus, the storage requirements for routers of MRT and MORE are bits and bits, respectively, where is the number of the logged packets on the router. In the same route, packets are logged on the same routers. Hence, if there is a ooding-based attack, then their log tables will grow hugely in a short time. However, in our scheme, the size of a hash table is Fig. 12. Computing time of logging schemes using MD5. xed, which secures our scheme against ooding-based attacks. To know the required size of the hash table on a router in the Internet, we have a simulation as follows. To simulate the Intime, and it comes up with the result that RIHT is about three ternet topology, we use the skitter project topology distributed by CAIDA [7] as our sample data set of the Internet. The data times faster than MRT and MORE (see Fig. 11). We use MD5 in the third simulation and nd that RIHT is set consists of paths to a specic host of the topology. We anfaster than MRT and MORE when (see Fig. 12). Since alyze CAIDAs skitter data and nd that it keeps 197 003 comit is easy that each path is taken by more than 60 packets in the plete paths in total to the host, and its average hop count of Internet, we come to a conclusion that the logging time of our paths is 15.46. Its average upstream degree is 3.89 and more than 99.99% routers degrees are fewer than 256. Fig. 13 illusscheme is shorter than that of MRT and MORE. As for the computing time of a path reconstruction, MRT trates the distribution of path length. We implement RIHT on CAIDAs topology data. The result and MORE both require that a router searches its own log table using the request packets digest for nding its previously stored shows we need to 159 641 paths, and the top three of the marking eld. However, since the routers log tables in MRT mostly logged routers 23 462, 22 149, and 13 381 paths, reand MORE are unsorted, the search is by brute force. Therefore, spectively. Hence, by Theorem 1, it is enough to set the size of the average searching time required for MRT is where is each hash table as 2 . In fact, if each router is able to the number of logged packets in a log table, and it is for set its hash table size as is required to itself, the storage requireMORE where is number of logged packets in the log table ment of RIHT can be substantially reduced.

796

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, VOL. 7, NO. 2, APRIL 2012

same route to the router. Therefore, when ltering malicious packets, our schemes marking eld can act as a packet identity and avoid certain confusion. V. CONCLUSION In this paper, we propose a new hybrid IP traceback scheme (RIHT) for efcient packet logging aiming to have a xed storage requirement (under 320 K bytes, according to the CAIDAs skitter data set) in packet logging without the need to refresh the logged tracking information. Also, the proposed scheme has zero false positive and false negative rates in an attack-path reconstruction. Apart from these properties, our scheme can also deploy a marking eld as a packet identity to lter malicious trafc and secure against DoS/DDoS attacks. Consequently, with high accuracy, a low storage requirement, and fast computation, RIHT can serve as an efcient and secure scheme for hybrid IP traceback. As for our future work, we would like to come up with another version of RIHT which uses a 16-bit marking eld to avoid the problem caused by packet fragmentation.

Fig. 13. Distribution of path length.

C. False Positive and False Negative Rates

When a router is mistaken for an attack router, we call it false positive. When we fail to trace back to an attacker, we call it false negative. In MRT and MORE, the size of a log table increases with the number of logged packets, but a routers memory is limited. Thus, when those schemes are out of memory, they have to refresh their log tables. The false REFERENCES positive or false negative problem happens when the logged [1] B. Al-Duwari and M. Govindarasu, Novel hybrid schemes employing data is refreshed. Unlike MRT and MORE, RIHTs hash table packet marking and logging for IP traceback, IEEE Trans. Parallel Distributed Syst., vol. 17, no. 5, pp. 403418, May 2006. size depends on the number of logged paths, and the table does [2] A. Appleby, Murmurhash 2010 [Online]. Available: http://sites.google. not have to refresh. Therefore, RIHT has no false positive and com/site/murmurhash/ false negative problem in this respect. [3] A. Belenky and N. http://ieeexploreprojects.blogspot.com Ansari, IP traceback with deterministic packet marking, IEEE Commun. Lett., vol. 7, no. 4, pp. 162164, Apr. 2003. Even if the logged data is not cleared, MRT and MORE [4] A. Belenky and N. Ansari, Tracing multiple attackers with determinstill are unable to avoid the false positive problem. In MRT, a istic packet marking (DPM), in Proc. IEEE PACRIM03, Victoria, BC, router logs the marking elds of packets, which are indexed Canada, Aug. 2003, pp. 4952. [5] S. M. Bellovin, M. D. Leech, and T. Taylor, ICMP traceback mesby the digests of the packets, on the log table. In MORE, a sages, Internet Draft: Draft-Ietf-Itrace-04.Txt, Feb. 2003. router uses different log tables, which are associated with , [6] H. Burch and B. Cheswick, Tracing anonymous packets to their apto log the marking elds of packets indexed by the digests of proximate source, in Proc. USENIX LISA 2000, New Orleans, LA, Dec. 2000, pp. 319327. the packets. Therefore, the false positive rates of MRT and [7] CAIDAs Skitter Project CAIDA, 2010 [Online]. Available: MORE are greater than 0 even without refreshing if a collision http://www.caida.org/tools/skitter/ of digests happens in a log table. On the other hand, in RIHT, [8] K. H. Choi and H. K. Dai, A marking scheme using Huffman codes for IP traceback, in Proc. 7th Int. Symp. Parallel Architectures, since we mark index on each logged packets marking eld, Algorithms Networks (SPAN04), Hong Kong, China, May 2004, pp. under the guidance of each index, we can just obtain the logged 421428. data from the hash table and circumvent the collision problem. [9] C. Gong and K. Sarac, A more practical approach for single-packet IP traceback using packet logging and marking, IEEE Trans. Parallel Therefore, without any chance of a collision in our scheme, our Distributed Syst., vol. 19, no. 10, pp. 13101324, Oct. 2008. false positive rate is 0, hence higher accuracy. D. Packet Identity In the same route, every packets marking eld is the same on an arbitrary router. Hence, a packets marking eld is often seen as a packet identify [23] and used to help us identify an attack packets source and then lter malicious packets. But in the MRT and MORE schemes, we are unable to identify a packets source from its marking eld in the following situation. Fig. 2, for example, illustrates those packets, which are logged on the same router, say in this case, come from different sources, and turn out to carry the same values in their marking elds when they are heading to the victim. For this reason, the victim as well as and are confused and unable to identify the packets source simply from the comparison of marking elds. However, in our scheme, the marking elds of packets are identical to each other on a router if and only if they travel on the
[10] A. Hussain, J. Heidemann, and C. Papadopoulos, A framework for classifying denial of service attacks, in Proc. ACM SIGCOMM 03, Karlsruhe, Germany, Aug. 2003, pp. 99110. [11] W. John and S. Tafvelin, Analysis of internet backbone trafc and header anomalies observed, in Proc. IMC 07: 7th ACM SIGCOMM Conf. Internet Measurement, San Diego, CA, Oct. 2007, pp. 111116. [12] W. John and T. Olovsson, Detection of malicious trafc on backbone links via packet header analysis, Campus-Wide Inform. Syst., vol. 25, no. 5, pp. 342358, 2008. [13] D. E. Knuth, The Art of Computer Programming, 2nd ed. Redwood City, CA: Addison Wesley Longman, 1998, vol. 3, pp. 513558. [14] T. Korkmaz, C. Gong, K. Sarac, and S. G. Dykes, Single packet IP traceback in AS-level partial deployment scenario, Int. J. Security Networks, vol. 2, no. 1/2, pp. 95108, 2007. [15] S. Malliga and A. Tamilarasi, A proposal for new marking scheme with its performance evaluation for IP traceback, WSEAS Trans. Computer Res., vol. 3, no. 4, pp. 259272, Apr. 2008. [16] S. Malliga and A. Tamilarasi, A hybrid scheme using packet marking and logging for IP traceback, Int. J. Internet Protocol Technol., vol. 5, no. 1/2, pp. 8191, Apr. 2010. [17] L. C. Noll, FNV Hash 2010 [Online]. Available: http://www.isthe.com/ chongo/tech/comp/fnv/index.html

YANG AND YANG: RIHT: A NOVEL HYBRID IP TRACEBACK SCHEME

797

[18] S. Savage, D. Wetherall, A. Karlin, and T. Anderson, Practical network support for IP traceback, in Proc. ACM SIGCOMM2000, Stockholm, Sweden, Aug. 2000, pp. 295306. [19] A. C. Snoeren, C. Partridge, L. A. Sanchez, C. E. Jones, F. Tchakountio, B. Schwartz, S. T. Kent, and W. T. Strayer, Single-packet IP traceback, IEEE/ACM Trans. Networking, vol. 10, no. 6, pp. 721734, Dec. 2002. [20] D. X. Song and A. Perrig, Advanced and authenticated marking schemes for IP traceback, in Proc. IEEE INFOCOM2001, Anchorage, AK, Apr. 2001, pp. 878886. [21] I. Stocia and H. Zhang, Providing guaranteed services without peer ow management, in Proc. ACM SIGCOMM99, Boston, MA, Sep. 1999, pp. 8194. [22] The MD5 Message-Digest Algorithm. : IEFT RFC 1321, 1992. [23] X. J. Wang and Y. L. Xiao, IP traceback based on deterministic packet marking and logging, in Proc. SCALCOM-EMBEDDEDCOM09, Dalian, China, Sep. 2009, pp. 178182. [24] A. Yaar, A. Perrig, and D. Song, FIT: Fast internet traceback, in Proc. IEEE INFOCOM2005, Miami, FL, Mar. 2005, pp. 13951406. [25] L. Zhang and Y. Guan, TOPO: A topology-aware single packet attack traceback scheme, in Proc. IEEE In. Conf. Security Privacy Communication Networks (SecureComm 2006), Baltimore, MD, Aug. 2006, pp. 110. [26] C. Gong and K. Sarac, Toward a practical packet marking approach for IP traceback, Int. J. Network Security, vol. 8, no. 3, pp. 271281, Mar. 2009. [27] F. Gont, Security assessment of the internet protocol version 4, Internet Draft: Draft-Ietf-Opsec-Ip-Security-07.Txt, Apr. 2011.

Ming-Hour Yang received the Ph.D. degree in computer science and information engineering at National Central University, Taiwan. He is currently with the Department of Information and Computer Engineering, Chung Yuan Christian University, Taiwan. His research mainly focuses on network security and system security with particular interest in security issues in RFID and NFC security communication protocols. Topics include: mutual authentication protocols, secure ownership transfer protocols, polymorphic worms, tracing mobile attackers.

Ming-Chien Yang received the Ph.D. degree in computer and information science from National Chiao Tung University, Taiwan, in 2005. From 2005 to 2007, he was an Engineer in the Industrial Technology Research Institute (ITRI). He has been a Faculty Member of Aletheia University, Taiwan, since 2008. His current position is an Assistant Professor in the Department of Information Application. His research interests include parallel and distributed computing, information security, interconnection networks, design and analysis of algorithms, and digital home.

http://ieeexploreprojects.blogspot.com

You might also like