A STUDY OF FEATURE SELECTION METHODS IN INTRUSION DETECTION SYSTEM: HYBRID METHOD

HYBRID METHOD

In this paper(NG et al., 2003), a feature importance ranking methodology based on the stochastic radial basis function neural network output sensitivity measure (RBFNN-SM) is presented. RBFNN-SM is used to evaluate the features for only the normal and six classes of denial of service (DOS) attack. The experiments show that 8 {feature no.: 2, 24, 23, 29, 32, 34, 33 and 36} most significant sensitive features are enough to classify normal and DOS attacks. The computation complexity reduced to 9 seconds from 23 seconds. The classification accuracy for normal and DOS attacks are 99.77% and 99.06%; the FAR for 8 (41) features are 0.18% (0.01%) and 0.27% (0.03%); the FPR are 0.93% (0.70%); and training and testing are 0.94% and (0.71%) respectively.

Shazzad and Park (2005) proposed a fast hybrid feature selection method to determine an optimal feature set. This method is a fusion of Correlation-based Feature Selection (CFS), Support Vector Machine (SVM) and Genetic Algorithm (GA). Subsets of features are generated by Genetic Algorithm and evaluated by CFS and SVM. The 12 selected features are {feature no.: 1, 6, 12, 14, 23, 24, 25, 31, 32, 37, 40 and 41}. Optimal subset set has 99.56% as DR and 37.5% as FPR in average. Communications Skills

Chebrolu, Abraham and Thomas(2005) investigated the performance of two feature selection techniques, Bayesian Networks (BN) and Classification and Regression Trees (CART) and developed the ensemble classifier of both techniques for building an IDS and best in classifying R2L and DoS. Seventeen important features are {feature no.: 1, 2, 3, 5, 7, 8, 11, 12, 14, 17, 22, 23, 24, 25, 26, 30 and 32} are selected by Markov blanket model and a classifier is constructed using BN and tested. Twelve features {feature no.: 3, 5, 6, 12, 23, 24, 25, 28, 31, 32, 33 and 35} are selected by decision tree and a classifier using CART is constructed and tested. Normal class is classified 100% correctly and the accuracies of classes U2R and R2L have increased by using the 12-variable reduced data set. It is observed that CART classifies accurately on smaller data sets. In ensemble approach, the BN classifier and the CART models are constructed first individually. Then the ensemble approach is used for the 12, 17 and 41-variable data sets. By using the ensemble model, Normal, Probe and DOS could be detected with 100% accuracy and U2R and R2L with 84% and 99.47% accuracies, respectively.

In this paper [55] (Chen et al., 2007), a new hybrid approach named as C4.5-PCA-C4.5 is proposed. It uses PCA (Principal Component Analysis) and decision tree classifier C4.5 as feature selection method and C4.5 as classifiers. The important features extracted are {feature no.: 33, 34, 4, 1, 3, 10 and 22}. The performance of C4.5-PCA-C4.5 is compared with other four systems C4.5-ALL, C4.5-PCA, SVM-CFS and SVM-CFS-SVM. The experiment results show that C4.5-PCA-C4.5 has lower testing time, fast training and testing process, highest TPR, lowest FPR. Average building process time for C4.5-PCA-C4.5 is 6 sec. Lee et al. (2007) [56] uses two machine learning algorithms Random Forests (RF) for feature selection and Minimax Probability Machine (MPM) for intrusion detection. The top 5 {feature no.: 23, 6, 29, 3 and 5} important features are selected. Only Denial of Service (DoS) attacks are used. The detection rate is 99.84% and average simulation time is 0.1039 sec.

Wei Wang et al. (2008) used filter and wrapper scheme for feature selection. Information gain (IG) based filter model and Bayesian networks (BN) and decision trees (C4.5) based wrapper model are employed to select features for network intrusion detection and Bayesian networks (BN) and decision trees (C4.5) as classifier. Experiments results and selected 10 features for each class are shown in Table 15.

t15A Study of Feature Selection Methods-16
Hong and Haibo (2009) proposed a new hybrid selection algorithm to build lightweight network IDS. Chi-Square and enhanced C4.5 algorithm are used for feature selection in the preprocessing phase. The top fifteen most important features extracted from Chi-Square algorithms are {feature no.: 5, 3, 23, 35, 4, 8, 30, 34, 36, 6, 33, 38, 24, 25 and 2}. The top five features extracted by C4.5 and C4.5-Chi2 methods are {feature no.:25, 4, 2, 5 and 29} and {feature no.: 5, 3, 4, 8 and 25} respectively. The experimental results are shown in Table 16.

t16 A Study of Feature Selection Methods-17
In this paper (Xiang et al., 2009), a hybrid method named Robust Artificial Intelligence Selection Algorithm (RAIS) is presented. Mutual information and artificial intelligence method are used for feature subsets selection and SVMs as classifier. Selected features are not mentioned in this paper. The experimental results show that the RAIS algorithm has the lowest false alarm rate, 3.49%, the highest rate of accuracy, 99.01%, and detection rate, 99.27%.

Zaman and Karray (2009) proposed a novel and simple method named Enhanced Support Vector Decision Function (ESVDF) for features selection. This method utilizes the Support Vector Machines (SVMs) approach based on Forward Selection Ranking (FSR) and Backward Elimination Ranking (BER) algorithms. The ESVDF (SVDF/FSR or SVDF/BER) method applies SVDF in the FSR and BER approaches to select the most effective features set. Two classifiers: Neural Networks (NNs) and SVMs are used to evaluate features. The experimental results are shown in Table 17. Feature’s name is not mentioned.

t17A Study of Feature Selection Methods-18
Ming-Yang Su (2011) proposed a method for feature selection to detect DoS/DDoS attacks in real time for designing an anomaly-based NIDS. Genetic algorithm (GA) combined with KNN (k-nearest-neighbor) are used for feature selection and weighting. The result of KNN classification is used as the fitness function in a genetic algorithm to evolve the weight vectors of features. Initial 35 features in the training phase are weighted. The top 19 features are considered for known attacks and the top 28 features for unknown attacks. Extracted features are not mentioned in the paper. An overall accuracy rate of 97.42% is obtained for known attacks and 78% for unknown attacks.