You are on page 1of 4

Vol 02, Issue 01, June 2013 http://iirpublications.

com

International Journal of Data Mining Techniques and Applications ISSN: 2278-2419

Detecting Data Theft Using Emergent Patterns


Jakka Srividya, 2Harini M. D., 3Bharathi R. 1 Student, jsrividya9@gmail.com 2 Student, md.harini16@gmail.com 3 Assistant Professor, bharavi_kumar@yahoo.co.in
1

Abstract Detecting and mitigating insider threat is a critical element in the overall information protection strategy. By successfully implementing tactics to detect this threat, organizations mitigate the loss of sensitive information and also potentially protect against future attacks. This paper describes the effectiveness of the pattern Increased Review for Intellectual Property (IP) Theft by Departing Insiders, which helps organizations plan, prepare, and implement a strategy to mitigate the risk of insider theft of intellectual property (IP). We develop this method by stochastically modelling file system behaviour under both routine activity and copying, and identifying emergent patterns in MAC timestamps unique to copying. These patterns are detectable even months afterwards. We have successfully used this method to investigate data ex-filtration in the field. Our method presents a new approach to forensics: by looking for stochastically emergent patterns, we can detect silent activities that lack artifacts. Keywords Emergent patterns, Insider, Intellectual property, MAC timestamp, Theft of IP I. INTRODUCTION can compromise the communications network and various network servers and resources. In todays widely-connected network environments, a successful insider attack could result in serious damage to the interests of an enterprise. Government sectors, which provide access to classified information to authorized personnel, are vulnerable to insider attacks. Compared to external threats, insider threats are more dangerous, devastating, and challenging to detect and prevent since trusted individuals have access privileges, know the networks, and also have specific information they wish to exfiltrate. Within the broader goals of mitigating insider attacks, our work only addresses the detection, deterrence and prevention of deliberate and unintended distribution of sensitive content outside of the organization using the organizations system and network resources by a trusted insider. The design pattern community generally advocates that a pattern should be successfully used in a significant development context at least three times to show its efficacy and gain the pattern communities acceptance. These uses are to be documented in the Known Uses section of the pattern write-up. We therefore refer to this view of patterns as the known-use view. Pattern mining in the known-use view involves examining how people have built systems in the past and capturing the essence of successful
137

This paper describes results of an investigation to determine the effectiveness of a previously published pattern, Increased Review for Intellectual Property (IP) Theft by Departing Insiders, which helps organizations plan, prepare, and implement a strategy to mitigate the risk of insider theft of IP Clear exposition of this pattern and our associated data analysis depends on the definition of several key terms: insider: an employee, contractor, or other trusted business partner of an organization intellectual property (IP): any information owned by the organization that the organization wishes to protect (i.e., keep secret) theft of IP: any ex-filtration (copying or removal) of IP that harms the owner of the information, could harm the owner, or could benefit some party in a way that harms the owner. An insider attack describes the damage that can occur to the interests of an organization by a trusted individual with legitimate access to its network and system resources. Such an attack can occur through an inadvertent security breach by an authorized user, a planned security breach by an authorized user, or by an outsider through a compromised system. The planned insider attack can results in the exfiltration or destruction of sensitive data or it
Integrated Intelligent Research (IIR)

Vol 02, Issue 01, June 2013 http://iirpublications.com

International Journal of Data Mining Techniques and Applications ISSN: 2278-2419

approaches in the pattern format. Patterns are not createdthey are discovered. In this paper, we do not attempt to continue the epistemological debate about patterns. We accept as legitimate the view of patterns as testable hypotheses. To our knowledge, there has been no successful demonstration or attempted usage of the pattern Increased Review for IP Theft by Departing Insider. II. MAC Timestamps Farmer and Venemas seminal work (Farmer, 2000; Venema, 2000; Farmer and Venema, 2004) describes reconstructing system activity via MAC timestamps. MAC timestamps are file system metadata which record a files most recent Modification, Access, and Creation times. By plotting these on a timeline, investigators can reconstruct file system activity, and hence computer usage, of a particular time. An investigator can also plot a histogram of file system activity, showing amount of activity per time period (Casey, 2004). Seemingly, we should be able to use MAC timestamps to detect data ex-filtration. However, as mentioned above, the standard methods of MAC timestamp analysis fail to do this. Neither timelines nor histograms can distinguish copying from other forms of file access. Moreover, Microsoft Windows NTFS systems do not update a files access timestamp when it is copied. Unlike Unix based systems, which implement copy commands in user code via standard reads of the source file and writes to the destination file (Sun Microsystems Inc., 2009a,b; Free Software Foundation Inc., 2010), Windows provides a dedicated CopyFile() system operation (Microsoft Corporation, 2010a). Thus, Unix based file systems do not distinguish copying a file from other forms of accessing it; both are done via read(), and both update the files access timestamp. (This was experimentally confirmed using the cp command on a Linux 2.6.25 ext3 system.) Windows, however, distinguishes between the two at the system level. Our experiments (performed on a Microsoft Windows XP Professional 5.1.2600 system) confirm that Windows indeed does not update the access timestamp of the source file when copying it, making file copying seemingly invisible. III. EMERGENT PATTERNS

The Increased Review for IP Theft by Departing Insiders pattern helps an organization plan, prepare, and implement a strategy to mitigate the risk of insider theft of IP. This section provides a summary of that pattern, the full details of which are provided in our PLoP 2011 paper [Moore et al. 2011].Insider threat case data shows that risk of insider theft of IP is greatest at the point of employee termination. This pattern helps reduce that risk through increased review of insiders actions as they leave the employment of an organization. This increased review is above and beyond what might be required for an organizations baseline detection of potentially malicious insider actions. The intended audience of this pattern is data owners within an organizationthose who make decisions about the protection requirements for certain data, including who has access to itas well as managers of departments across the organization: information technology, human resources, physical security, and legal. The pattern applies to organizations large enough to have these distinct departments and roles. However, smaller organizations may also benefit from this pattern if they can identify individuals with the associated responsibilities. A. Context The context for this problem is an organization that has valuable IP at risk of insider theft. IP includes any of an organisations sensitive or confidential information that it would like to protect. An insider of an organization includes any employee, contractor, or other business partner of that organization. The organisations critical point of action is when an insider is being terminated, either voluntarily (e.g., resigning) or involuntarily (e.g., firing). B. Problem How can the organization cost-effectively mitigate the risk of losing its critical IP? Data on 48 cases of theft of IP, from our insider threat database, shows that over 50 percent of the insiders stole at least some of the information within 30 days of their termination. Current case trends suggest that organizations regularly fail to detect theft of IP by insiders, and even when theft is detected, organizations find it difficult to attribute the crime to any specific individual. The solution to this pattern is affected by the following forces: cost of
138

Integrated Intelligent Research (IIR)

Vol 02, Issue 01, June 2013 http://iirpublications.com

International Journal of Data Mining Techniques and Applications ISSN: 2278-2419

insider action review, employee privacy, IP ownership rights, employee productivity during the period between resignation and termination, and legal propriety of insider action review. C. Solution To deal adequately with the risk that departing insiders might take valuable IP with them, the organization must ensure that the necessary agreements are in place (IP ownership and consent to review), critical IP is identified, the activities of key departing insiders are reviewed, and the necessary communication among departments takes place. When an insider resigns, the organization should increase its scrutiny of that employees activities within a well-defined window before the insiders termination date. Computer audit logs of employee online actions must be kept for at least the length of the review window so that those logs may be scrutinized even if an insider terminates employment immediately. Actions taken upon and before employee termination are vital to ensuring that IP is not compromised and the organization preserves its legal options. The HR department needs to track insiders who have access to the IP so that when the insider resigns, HR can ask IT staff or systems to review that insiders online behaviour for signs of suspicious ex-filtration of IP. IT staff or systems need to closely review the insiders access to critical IP during the review window before termination because many IP thieves have stolen information within this window. Although the organization may decide to begin review before the review window, restricting review to this period may help the organization balance the review costs with the risks of losing the IP. IT staff or systems must inform the data owners of any suspicious access to critical IP, and the data owners must be included in the response decision-making. D. Expected Benefits The primary expected benefit of the Increased Review for IP Theft by Departing Insiders pattern is that review of departing insiders is tailored to ensure a good cost-benefit ratio, while keeping insiders productive during their final days at work. We can distinguish between the access pattern of copying and that of routine access. Routine file access is selective: individual files and folders are opened while others are ignored. It is also temporally irregular: files are accessed
Integrated Intelligent Research (IIR)

in response to user or system activity, followed by a lull in access until the next activity causes new file access. Copying of folders, however, is non-selective: every file and subfolder within the folder is copied. It is furthermore temporally continuous: files are copied sequentially without pause until the entire operation is complete. Copying folders is also recursive: copying one folder invokes the copying of all subfolders, which each invoke copying of their subfolders, and so on, while routine activity is randomly ordered. This recursive nature of copying results in an additional trait. To copy a folder, the system must enumerate the folders contents. Modern file systems implement folders as special types of files called directories; to enumerate a folders contents, the system accesses and read the directory file. Thus, copying will invariably access a directory before accessing its files and subfolders. Whats more, since this is a data read and not a file copy, Windows NTFS does update the access time of the directory when its contents are enumerated. Thus, although, as stated above, copying creates no artefact, it does create distinct emergent patterns. A file system examined immediately after copying occurs will show the five characteristics enumerated. Consequently, if a folder was copied, we can expect to find the following, even if several weeks or months have elapsed since the time of copying: Neither the copied folder, nor any of its subfolders, have access timestamps less than the time of copying. A large number of these folders have access timestamps equal to the time of copying. On Windows, file timestamps will not resemble folders timestamps. Specifically, many files will have access timestamps before any of the folders. In the course of our experiments, weve found access timestamp behaviour to be quite mercurial. Here are the experimental pitfalls we encountered and solutions. _ Systems may, for performance reasons, decline to update an access timestamp. Since maintaining accurate access timestamps may involve substantial performance costs, and isnt deemed system
139

Vol 02, Issue 01, June 2013 http://iirpublications.com

International Journal of Data Mining Techniques and Applications ISSN: 2278-2419

critical, systems may decline to update them. In many systems, this is user configurable (Microsoft Corporation, 2003a). In particular, some versions of Microsoft Windows ship configured to disable access timestamp updates. Complicating things further, some systems may selectively update the timestamps, for instance updating only when the newer timestamp differs from the previous one by a certain threshold. The recommended solution is to check system configuration and documentation before experimenting, and to exhaustively observe system behaviour under different scenarios. Systems may, for performance reasons, defer writing updates of access timestamps to the file system. Even when file systems do maintain accurate access timestamps, they may cache the updates in memory before writing them to a disk (Microsoft Corporation, 2003b). Thus, if a file system is examined before a system has been shutdown properly, its access timestamps may not be accurate. Systems may report updated access timestamps even before writing them to disk. In cases when updates are deferred, queries to the system for access time may return the updated value stored in memory, even though it is has not yet been written to disk. Thus, an experimenter may find one value if he queries the operating system, and another value if he directly examines the file system. Querying a files access timestamp may itself update it. For instance, we have found that using Windows Explorer to display a files access timestamp will cause the access timestamp to be updated to the current time. These last three problems can be solved by not using the standard operating system facilities to query access time, but instead shutting the operating system down normally and then directly examining the file system image using specialized tools. Admittedly, this makes experimentation cumbersome. IV. CONCLUSIONS As noted, copying of data has no known artifacts. Nonetheless, we can reliably detect emergent patterns unique to copying, even months after its occurrence. Statistical mechanics, which treats objects as individually unpredictable and looks for patterns which nonetheless emerge stochastically, gives us insight beyond the classical laws from which it derives. Similarly, we believe stochastic
Integrated Intelligent Research (IIR)

forensics provides us with means to analyze hitherto undetectable activity. REFERENCES Effectiveness of a Pattern for Detecting Intellectual Property Theft by Departing Insiders , ANDREW P. MOORE, CERT Program, Software Engineering Institute DAVID MCINTIRE, CERT Program, Software Engineering Institute DAVID MUNDIE, CERT Program, Software Engineering Institute , DAVID ZUBROW, Software Engineering Measurement and Analysis, Software Engineering Institute Detecting data theft using stochastic forensics Jonathan Grier Vesaria, LLC, United States Keywords: Data Carvey Harlan. Windows forensic analysis DVD Toolkit. 2nd ed. Syngress Publishing; 2009. Carvey Harlan, Altheide Cory. Tracking USB storage: analysis of windows artifacts generated by USB storage devices. Digital Invest 2005;2(2):94e100. Casey Eoghan. Digital evidence and computer crime. Orlando, FL, USA: Academic Press, Inc.; 2004. Chow KP, Law Frank YW, Kwan Michael YK, Lai Pierre K Y. The Rules of time on NTFS file system. Pages 71e85 of: SADFE 07. In: Proceedings of the second international workshop on systematic approaches to digital forensic engineering. Washington, DC, USA: IEEE Computer Society; 2007. CSI andFBI. 2003 computer crimeandsecurity survey.TechRept 2003. Farmer Dan. What are MACtimes? Dr Dobbs J Software Tools 2000;25(10):70e4. 68. Farmer Dan, Venema Wietse. Forensic discovery. Addison Wesley Professional; 2004.

140

You might also like