A STUDY OF FEATURE SELECTION METHODS IN INTRUSION DETECTION SYSTEM: WRAPPER METHOD

In paper (Middlemiss and Dick, 2003), a simple Genetic Algorithm (GA) is used to evolve weights for the features and k-nearest neighbour (KNN)classifier is used as fitness function of the GA and also as classifier. Top five ranked features for each class are selected {DoS-23,29,1,11,24; R2U-24,3,12,23,36; U2R-24,6,31,41,17; Probe-2,37,30,3,6}. The result shown indicates an increase in intrusion detection accuracy.

Mukkamala and Sung (2003) presented two methods to rank the important features: (1)Performance-Based Ranking Method (PBRM) and (2) Support Vector Decision Function Ranking Method (SVDFRM). Thirty one features are selected by union of important features for each of the 5 classes ranked by PBRM. In SVDFRM, the union of important features for each of the 5 classes are 23. The 8 important features identified by both ranking methods are {feature no.: 1, 3, 5, 6, 23, 24, 32 and 33}. Experiments are performed by both methods with classifier SVM (Table 8). Future Work: Ongoing experiments include making 23-class (22 attack classes plus normal) feature identification using SVMs. Finance teams

t8A Study of Feature Selection Methods-9
The Ant Colony Optimization (ACO) based intrusion feature selection algorithm is proposed in (Gao et al., 2005). The fisher discrimination rate is adopted as the heuristic information for ants’ traversal. The Least Square based SVM classifier is adopted as the base classifier to evaluate the generated feature subset. The number of features selected by applying ACO-SVM methods is 11 for Probe, 9 for DoS, and 14 for U2R & R2L. Features name is not mentioned in this paper. Table 9 shows the experimental results.

9A Study of Feature Selection Methods-10
This paper(Bankovic et al., 2007) investigated the possibility to increase the detection rate (DR) of U2R attacks in misuse detection. Extracted features obtained by using Principal Component Analysis(PCA) and Multi Expression Programming(MEP) are {U2R-14, 33; DoS- 1, 5, 39; Normal- 3, 10, 12}. Genetic algorithm is employed to implement rules for detecting various types of attacks. Additional two more rule sets are deployed to re-check the decision of the rule set for detecting U2R attacks. The experiments show (Table 10) that this system outperforms the best-performed model reported in literature.

10 A Study of Feature Selection Methods-11
Chen et al. (2007)presented a wrapper based feature selection method. A random search method named modified random mutation hill climbing (MRMHC) is introduced as search strategy to select features subsets and Support Vector Machines (SVMs) as classifier. The experiments are shown in Table 11. Future Work: This method can be improved on search strategy and evaluation criterion.

11 A Study of Feature Selection Methods-12
A multi-objective genetic fuzzy intrusion detection system (MOGFIDS) is proposed by Tsang et al. (2007). The MOGFIDS is used as a genetic wrapper to search for a near-optimal feature subset. The 27 features selected by MOGFIDS are {feature no.: 2 (tcp, udp, icmp), 5, 6, 7, 8, 9, 11, 12, 13, 14, 17, 18, 22, 23, 25, 30, 32, 33, 34, 35, 36, 37, 38, 39 and 40}. The MOGFIDS has second highest ACC (99.24%) and lowest FPR (1.1%) among the wrappers in the paper. Future Work: This can be applied to other complex problem domains such as face recognition and DNA computing.

This paper (Wang and Gombault, 2008) proposed a system that extracts important features from raw network traffic only for DDoS attacks in real computer networks. The first 9 important features {feature no.: 23, 32, 37, 33, 5, 24, 31, 39 and 3} based on rank are selected by Information Gain and Chi-square method and evaluated by Bayesian Networks and decision trees (C4.5) shown in Table 12. Future Work: A practical real-time system for fast detection of DDoS attacks can be developed.

12 A Study of Feature Selection Methods-13
Li et al. (2009) [48] proposed a wrapper-based feature selection method to build lightweight intrusion detection system. Modified Random Mutation Hill Climbing (RMHC) method are applied as search strategy to find a candidate feature subset and modified linear Support Vector Machines (SVMs) to evaluate the candidate feature subset. A classification algorithm based on a decision tree whose nodes consist of linear SVMs is used to build the IDS from selected features subsets. The experiments show that the systems have higher ROC (Receiver Operating Characteristic) scores than all 41 features in terms of detecting known attacks, new attacks and computational cost (Table 13).

t13 A Study of Feature Selection Methods-14
This paper (Ali et al., 2010) improve the accuracy of Signature Detection Classification (SDC) Model by applying the features extraction based customized features. Features are extracted by using GA (Genetic Algorithm), two-second-time and Hidden Markov from customized features. Eleven features {feature no.: 5, 6, 13, 23, 24, 25, 26, 33, 36, 37 and 38} are extracted and the best signature detection classification model is developed using JRip, Ridor, PART and Decision tree. The extracted features have increased the detection rates between 0.4% to 9% and reduced false alarm rates between 0.17% to 0.5%.

Gong et al. (2011) proposed a novel approach for feature selection based on Genetic Quantum Particale Swarm Optimization (GQPSO) for network intrusion detection. Support Vector Machine (SVM) is used for classification algorithm. Selected features and experimental results are shown in Table 14.

t14A Study of Feature Selection Methods-15
Li et al. (2012) proposed an effective wrapper-based feature reduction method, called gradually feature removal (GFR) method. The GFR method extracted 19 critical features {feature no.: 2, 4, 8, 10, 14, 15, 19, 25, 27, 29, 31, 32, 33, 34, 35, 36, 37, 38 and 40}. The accuracy of SVM classifier is achieved 98.6249% and MCC (Matthews correlation coefficient) is 0.861161. The training and testing time of SVM classifier is greatly reduced.

An advanced intelligent systems using ensemble soft computing techniques is proposed by Sindhu et al. (2012) [52] for a lightweight IDS to detect anomalies in networks. GA (Genetic Algorithm) is used to extract the feature subset and a neurotree paradigm is proposed as a classifier. Features extracted by this method are 16 {feature no.: 2, 3, 4, 5, 6, 8, 10, 12, 24, 25, 29, 35, 36, 37, 38 and 40}. The detection rate is 98.4% which is superior to other methods.