A STUDY OF FEATURE SELECTION METHODS IN INTRUSION DETECTION SYSTEM: IINTRUSION DETECTION SYSTEM

DETECTION SYSTEM

An intrusion is defined as an attempt to compromise the confidentiality, integrity, availability, unauthorized use of resources, or to bypass the security mechanisms of a computer system or network and James P. Anderson introduced Intrusion Detection (ID) early in 1980s. Dorothy Denning proposed several models for IDS in 1987 [3]. Ideally, Intrusions Detection (ID) should be an intelligent monitoring process of events occurring in system and analyzing them for security violations policies. An IDS is required to have a high attack Detection Rate (DR) with a low False Alarm Rate (FAR). Refer for the organization of a generalized IDS.

Approaches of IDS based on detection are anomaly based and misuse based intrusion detection approach. In anomaly based intrusion detection approach, the system first learns the normal behavior or activity of the system or network to detect the intrusion. In misuse or signature based intrusion detection approach, the system first define the attack and the characteristics of the attack that distinguish this attack from normal data or traffic to detect the intrusion. Approaches of IDS based on location of monitoring are Network based intrusion detection system (NIDS) and Host-based intrusion detection system (HIDS). NIDS detects intrusion by monitoring network traffic in terms of IP packet. HIDS are installed locally on host machines and detects intrusions by examining system calls, application logs, file system modification and other host activities made by each user on a particular machine.

DATASETS AND PERFORMANCE EVALUATION

This section summarizes the popular benchmark datasets and performance evaluation measures in the intrusion detection domain to evaluate different feature selection methods in intrusion detection system

DATASETS

The KDD CUP 1999benchmark datasets are used to evaluate different feature selection method for IDS. It consists 4,940,000 connection records for training data set and 311,029 connection records for test data set. The training set contains 24 attacks and the test set contains 38 attacks. Since the training and test set are prohibitively large, another 10% of the KDD Cup’99 dataset is frequently used. Each connection had a label of either normal or the attack type, with exactly one specific attack type falls into one of the four attacks categories as: Denial of Service Attack (DoS), User to Root Attack (U2R), Remote to Local Attack (R2L) and Probing Attack. Each connection record consisted of 41 features and are labeled in order as 1,2,3,4,5,6,7,8,9,…..,41 and falls into the four categories are shown in

Table 1:

Category 1 (1-9) : Basic features of individual TCP connections

Category 2 (10-22) : Content features within a connection suggested by domain knowledge

Category 3 (23-31) : Traffic features computed using a two-second time window

Category 4 (32-41) : Traffic features computed using a two-second time window from destination to host
t1A Study of Feature Selection Methods-1

Performance Evaluation

The effectiveness of an IDS is evaluated by its ability to make correct predictions. According to the real nature of a given event compared to the prediction from the IDS, four possible outcomes are shown in Table 2, known as the confusion matrix. True Positive Rate(TPR) or Detection Rate(DR), True Negative Rate(TNR), False Positive Rate (FPR) or False Alarm Rate (FAR) and False Negative

Rate(FNR) are measures that can be applied to quantify the performance of IDSs based on the above confusion matrix.

t2A Study of Feature Selection Methods-2