Research by Nicholas Garofalo and Dr. W. Harrison
Saint John Fisher College Computer Science Department 2007 to 2009
Research and development of kernel level intrusion detection system that detects intrusions based on kernel module call frequencies. Worked primarily in C/Linux.
- Aided in the creation of an isolated testing environment for security research.
- Constructed and administered trials using various computer attacks in order to capture data for attack signatures.
- Implemented software for automation of attack signature calculation using large data sets from trials.
The following paper was published in 2009 and summarizes two years of research into embedding an IDS in the Linux kernel. The large part of work not shown here is the development of a dedicated computer security lab at Saint John Fisher College. The lab consists of multiple dual-NIC desktops and KVM switches which allow easily-configurable small-scale attack scenarios. The lab was used for security courses as well as performing scenarios for this research.
Computer security has become an increasingly important aspect of today’s digital world. Computer security, or information security, is necessary to protect our data, money, and even identity. Part of this protection is knowing if and when a system is under attack. An Intrusion Detection System (IDS) does just that. It detects when an intrusion or other malicious attack has occurred and allows actions to be taken that will prevent or repair damage.
The goal of this project was to fully implement the Linux kernel using the Linux Instrumentation Tool (LIT) so that computer intrusions may be detected based on the frequency of kernel module calls. In the past, the modules concerning networking functions were implemented and the system has been shown that it is be able to produce data. Completion of this project shows that we are able to collect data from the entire kernel, and also suggests further work to be done in the area of detection algorithms.
It should be noted that our intention was to create a library of attack data which may be used for the creation of attack signatures. These signatures could then be used for detection. The recovery process after or during an attack is outside the scope of this research.
The advantage of a system based on system calls is that it is faster and more dependable than IDS based on other variables. The data readout is nearly real-time and produces very little overhead, therefore allowing users to continue working at their normal pace. System calls are also exponentially harder to fake or hide. In order for any program to work it must use these calls, and since the IDS is embedded inside the kernel itself, these calls cannot be hidden.
During our research we considered other variables which could be taken into account: system logs, timestamps, etc. While some of these would act as an added benefit for forensic analysis after an attack, they did not offer any advantages while attempting to detect on ongoing attack in real time. It is for this reason that our system is only concerned with module call frequency, the number of times a module was called within a set time interval.
Background and Significance
To better understand the concepts of this research, we must begin by first defining some important terms that will be used to describe objects, techniques, etc. Beginning with our most basic notion we have the operating system. This, the programs running on it, and the data contained are what the IDS is designed to protect (Operating System, 2008).
An operating system (OS) is the software that manages the resources of a computer and provides programmers with an interface used to access those resources. An operating system processes system data and user input, and responds by allocating and managing tasks and internal system resources as a service to users and programs of the system. An operating system performs basic tasks such as controlling and allocating memory, prioritizing system requests, controlling input and output devices, facilitating computer networking and managing files.
The variety and variability of operating systems makes it difficult to design cross-compatible systems. Therefore this aspect was ignored in our research. The resulting system depends on the Linux kernel. The kernel is the core component of the OS and was the key focus of our work. Below is the definition as well as a graphical depiction of how the kernel works. Here, it is fairly clear that the kernel essentially acts as a middleman between the software and the hardware. It is for this reason that implementing the IDS inside the kernel is so advantageous (Kernel 2008).
… the kernel is the central component of most computer operating systems (OS). Its responsibilities include managing the system’s resources (the communication between hardware and software components). As a basic component of an operating system, a kernel provides the lowest-level abstraction layer for the resources (especially memory, processors and I/O devices) that application software must control to perform its function. It typically makes these facilities available to application processes through inter-process communication mechanisms and system calls.
For the creation of a kernel-level intrusion detection system, full access to the operating system kernel is a necessity. For this reason the open source kernel of Linux has been used. Most other systems have a closed-source kernel, which cannot be manipulated without expensive licenses and contracts. The Linux kernel however is completely open-source and well documented. This allowed for changes to be made, changes necessary to embedding the IDS. It should also be noted changing kernel versions requires the re-implementation of the kernel.
Operating systems of all kinds are subject to an attack, or some malicious activity directed at the system. The purpose of an intrusion detection system is to detect this activity to allow proper measures to be enacted. It is important to note that an IDS may go about detecting intrusions in a number of ways. Some may act as a virus scanner acts, by scanning files or packages on disks or transmitted over wires. Some may look at user login logs, or file access records (Intrusion Detection System, 2008).
An intrusion detection system (IDS) generally detects unwanted manipulations of computer systems, mainly through the Internet. The manipulations may take the form of attacks by crackers.
An IDS is composed of several components: Sensors which generate security events, a Console to monitor events and alerts and control the sensors, and a central Engine that records events logged by the sensors in a database and uses a system of rules to generate alerts from security events received. There are several ways to categorize an IDS depending on the type and location of the sensors and the methodology used by the engine to generate alerts. In many simple IDS implementations all three components are combined in a single device or appliance.
The subset of IDS includes signature based intrusion detection and anomaly based detection. Signature based detection works off of a pre-existing database of attack signatures. It has the advantage of being very fast, accurate, and easy to set up. Its downfall is that it can only detect attacks that are in its database, which is to say that it can only detect attacks that have performed before. Anomaly based detection works based on the idea of “different from the norm.” It detects whether a system has deviated too far outside normal behavior. This has the advantage of being able to detect attacks that have never been seen before. It’s downfalls however, are that it has a massive overhead and is very hard to implement. This difficulty is due to the fact that “normal behavior” must first be explained and the system has to undergo a vulnerable training period (Glickman, Balthrop, & Forrest, 2003).
- Requires a precompiled signature database
- Greater occurrence of false negatives
- Relatively easy to train
- Little system overhead
- Can detect new unique attacks
- Greater occurrence of false positives
- Difficult/vulnerable training
- Large system overhead
The system referenced in this paper is intended to use signature based detection. A signature is that which gives an object or piece of information its identity, and in the case of this kernel-level IDS, a signature is represented by the frequency of module calls within the attack window. This concept was originated some time ago at the onset of the project (Harrison, Krings, & Hanebutte, 2004).
The very first step to our research was to set up a test environment. Prior to this research, there was no lab or environment in which to run the tests we would need to run. Luckily, the onset of our research coincided with the building of the new Saint John Fisher security lab. This lab houses 6 machines all connected on their own private network. This isolation was important to our work so as to not endanger the rest of the campus network.
These 6 machines were outfitted with KVM over Ethernet so as to allow us to switch displays to any machine. Of these 6 machines, 2 were used for our testing. One was to act as the target machine, and the other as the attacker. The target was loaded with a basic installation, and then equipped with the instrumented kernel. The target machine was loaded with BackTrack 3, a Linux distribution pre-loaded with many penetration testing and attack frameworks. This provided us with the convenience of having basic attack tools on hand. It was also necessary to run attacks remotely from the attacks so as to not throw of the generation of signatures on the target. The developing IDS works by detecting attack signatures which we previously defined as the excess activity caused by an attack. If the attack itself were to generate activity, this would throw the results for future monitoring.
During the time of our setup Dr. Harrison took care of the compilation of the instrumented kernel. We first attempted to simply drop the kernel into the existing Linux installation on the target, but were unable to do so. Instead we replaced the target machine with the machine on which the kernel was compiled.
Implementation of the Linux kernel was done using a previously developed program called the Linux Instrumentation Tool (LIT). This tool essentially inserts a “hook” function into each of the selected modules. This allows the frequency of module calls to be tracked over time. These frequencies are then stored in a log for later scrutiny. Below is an example module before and after being implemented.
static void remove_memqueue (struct page *entry)
struct page *next = entry->next;
struct page *prev = entry->prev;
next->prev = prev;
prev->next = next;
static void remove_memqueue (struct page *entry)
struct page *next = entry->next;
struct page *prev = entry->prev;
next->prev = prev;
prev->next = next;
Once all machines were ready, we began running various attacks. These attacks proved difficult to find, due to their clandestine nature. We eventually found a great deal of them at Hoobie.net. In all, we successfully ran half a dozen attacks, three times each, while rebooting the machine between attacks. Many other attacks were attempted which failed to run on our particular system and setup. Of the successful attacks, three or so showed promising results. The other attacks showed little to no activity, which may be due to broken attack code or the absence of the proper service on the target machine.
During the attacks, we used the implemented kernel to create data files. These files contained the call frequency for each module over the course of 60 time frames. These data files were later mapped, using GNUplot, to provide a graphic representation of the data for analysis. Below is a sample graph of data from an example attack. Notice that particular module frequencies spike during the attack.
Running the attacks again and attempting to detect them proved to be non-trivial. On one hand, detection is a binary system of “attack” or “not an attack.” On the other hand, there are multiple variables that may or may not affect detection. For our testing, we used a “quiet” system. The only things running were the general OS processes, the IDS and the attack (run remotely from the attacker machine). Background noise, heavy traffic, and other obstacles may affect the ability of the IDS to detect (Mell, Lippman, & Zissman). Even on a quiet system however, the IDS must take into account time, severity, etc.
It was our original intention to observe the modules that spiked during the attack and use this data to form the attack signature. While monitoring the system, if the frequencies of these modules met or exceeded those found in the attack then it is assumed that the attack is occurring again. It may be the case that one frequency spiking is important, but the uniqueness of an attack may depend on the fact that another module did not spike. Below is a diagram of an attack signature laid over various system patterns. Notice that only the last pattern would be seen as an attack.
Analyzing our efficiency, a goal which we’re unable to achieve, would be a matter of checking the frequency of false positives and false negatives. A false positive occurs when the IDS detects an attack that is not actually occurring. A false negative occurs when the IDS fails to see an attack that is underway. Both of these are equally dangerous. Too many false positives is equivalent to the system “crying, ‘Wolf!’” and would eventually lead to it being ignored. Too many false negatives means that the protected system is persistently being attacks and the IDS is failing to alert someone. In the end, we want both the number of false positives and false negatives to be as close to zero as possible (Puketza, Zhang, Chung, Mukherjee, Olsson, & 1996).
As stated earlier, much of our time was spent creating our research environment, collecting attacks, and running trials. We also spent a great deal developing theory behind detection algorithms. Creating a baseline or threshold for attack signatures proved to be quite difficult. A straight attack data to current data comparison is useless because even slight variances in activity would throw our results. Past attempts at using a min-max approach were not dependable either. So we needed to develop our own methods.
During our trials we were also looking to analyze attacks based on attack stages. This theory led us to the idea of a sort of threshold. Activity rising and falling past this threshold would indicate the beginning and end of an attack stage. It was suggested that this threshold could be a calculated average of each frequency. Therefore, I created a program which would scan the time frames and average the frequency of each module and allow for adjustment if needed. This program would then output a data file which resembled the sample data, but could be used as a threshold for detecting attacks stages.
Unfortunately, we discovered late in our observations that such a threshold would not allow for dependable attack detection. Furthermore, statistically calculating a reliable attack threshold would prove to be extremely difficult given the variable and complex nature of the attack population.
Therefore, the final results of our research were a comprehensive environment in which to collect data, a library of various network-based attacks, and a collection of data from a number of those attacks. We also have eliminated several threshold creation methods that will not work, thus providing a stepping stone for future teams to use. My personal hope is that future research will find a statistically proven method to create the necessary threshold or to perhaps find an alternate method for detecting attacks and attack stages. Doing so would finally complete a truly innovative system to detect computer intrusions, and change the way intrusion detection is done in future.
- Glickman, M., Balthrop, J., & Forrest, S. AMachine Learning Evaluation of an Artificial Immune System; [cited 04/23/08]. Available from ftp://citadel.sjfc.edu/Research/lisys-ecj-05.pdf .
- Harrison, W., Krings, A., & Hanebutte, N. Optimizing the Observation Windows Size for Kernel Attack Signatures. Proceedings of the 37th Hawaii International Conference on System Sciences, 2004; [cited April 20 2008]. Available from http://csdl2.computer.org/comp/proceedings/hicss/2004/2056/07/205670189a.pdf
- Hofmeyr, S., Forrest, S., & Somayaji, A. Lightweight Intrusion Detection for Networked Operating Systems.
- Hoobie.net. (2009). HooBie Inc. [citedApril 315, 2009]. Available from: http://www.hoobie.net
- Intrusion Detection System. (2009). In Wikipedia [Web]. Wikimedia Foundation, Inc. [citedMarch 30, 2009]. Available from: http://en.wikipedia.org/wiki/Intrusion_detection_system
- Kernel (computer science). (2009). In Wikipedia [Web]. Wikimedia Foundation, Inc. [cited March 30, 2009]. Available from: http://en.wikipedia.org/wiki/Kernel_%28computer_science%29
- Mell, P., Hu, V., Lippmann, R., Haines, J., & Zissman, M. An Overview of Issues in Testing Intrusion Detection Systems. [cited March 30, 2009] from http://csrc.nist.gov/publications/nistir/nistir-7007.pdf
- Operating System. (2008). In Wikipedia [Web]. Wikimedia Foundation, Inc. [cited March 30, 2009]. Available from http://en.wikipedia.org/wiki/Operating_system
- Puketza, N., Zhang, K., Chung, M., Mukherjee, B., & Olsson, R. (1996). A Methodology for Testing Intrusion Detection Systems. [cited March 30, 2009] from http://seclab.cs.ucdavis.edu/papers/tse96.pdf