10.12 Data Mining IDSs


10.12 Data Mining IDSs

Data mining has been around for years; however, its application to intrusion detection is a relatively new concept. Some of the obstacles are encountered in the amount of data to be analyzed and its complexity. It is possible for a company to collect millions of records per day that need to be analyzed for malicious activity. What data needs to be looked at needs to be determined. Data mining can be integrated with anomaly detection and misuse detection to create an IDS that will allow an analyst to accurately and quickly identify an attack or intrusion on a network more quickly.

There has been increased interest in data- mining—based approaches to building detection models for IDSs. These methods can generalize models of both known attacks and normal behavior to detect unknown attacks. Domain experts can also generate them in a faster and more automated method than manually encoded models that require difficult analysis of audit data. Several effective data mining techniques for detecting intrusions have been developed, many of which perform as well as or better than systems engineered by domain experts.

The key ideas are to use data mining techniques to discover consistent and useful patterns of system features that describe program and user behavior and to use the set of relevant system features to compute inductively learned classifiers that can recognize anomalies and known intrusions. Experiments have been performed using the sendmail system call data and the network tcpdump data to construct concise and accurate models to detect anomalies. Machine-learning algorithms have been used to compute the intra- and inter-audit record patterns, which are essential in describing program or user behavior. The discovered patterns can guide the audit data-gathering process and facilitate feature selection.

IDSs have been developed using neural networks. The idea is to train and construct a model to predict a user's next action or command based on patterns of historical behavior. The network is trained on a set of representative user commands. After the training period, the network tries to match actual commands with the actual user profile already present in the model. Any incorrectly predicted events are considered deviations from the user's established profile. Some advantages of using neural networks are that they cope well with noisy data, their success does not depend on statistical assumptions about the nature of the underlying data, and they are easier to modify for new user profiles.

IDSs have also been constructed using machine-learning algorithms to create a massive decision tree of thousands of statistical "rules" of acceptable user and system behavior. Branches on the decision tree are labeled with conditional probabilities. These machine-learning decision trees can be trained from a few days of data. However, they cannot be updated to learn new rules as usage patterns change. With these machine-learning IDSs activity is considered abnormal if it does not match a branch in the decision tree or if it matches a branch with low conditional probability.

Security of network systems is becoming increasingly important as more and more sensitive information is being stored and manipulated online. IDSs have thus become a critical technology to help protect networks and systems. As we have seen, most IDSs are based on hand-crafted signatures developed by the manual encoding of expert knowledge. These systems match activity on the system being monitored to known signatures of attacks. The major problem with this approach is that the system cannot generalize to detect new attacks.

Data mining can help improve intrusion detection by adding a new level of focus to anomaly detection. Data mining can automate the process of detection by identifying bounds for valid network activity; it can assist security analysts in their ability to distinguish attack activity from common everyday traffic on the network. Data mining can also play a vital role in the area of data reduction. Current data mining techniques have the ability to identify or extract the data that is most relevant and provide analysts with different views of the log files to aid in their analysis.




Investigative Data Mining for Security and Criminal Detection
Investigative Data Mining for Security and Criminal Detection
ISBN: 0750676132
EAN: 2147483647
Year: 2005
Pages: 232
Authors: Jesus Mena

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net