Data Analysis
Fraud detection
Developed a prototype system in Lisp for automatic fraud detection based on multivariate statistical techniques. Applied to a popular and widely used intrusion attack dataset (KDD99). Using this system, the detection rate of all attack types is better than 98.5%, with a false positives rate of 0.5% on data representing half a million internet sessions. These results were obtained with all attack types, not the reduced five types of attack usually reported. This project was undertaken with the Columbia University Intrusion Detection Systems Lab (New York) to find out how multivariate statistical techniques perform in an area where knowledge representation and reasoning are more popular.
Although this system was not entered into the competition, lab colleagues reported that these results were much better than those reported for the winner of the KDD99 cup.
Computer performance modelling
Developed a multivariate regression model that accurately (> 97%) models load and response times from transaction-counts for systems running on a that computer. This statistical model makes it possible to produce reliable estimated response times per transaction. Such a model, extended with queuing theory, is necessary to predict computer load and response times accurately over longer periods of time.
The same model was used to predict how computers would behave with and without changes in the system and/or changes in the application mix.
Algorithms
Developed and implemented a new, optimal algorithm for monotone regression, essential for multidimensional scaling of data with ordinal information. The problem is equivalent to the Convex Hull problem, for which the optimal algorithm was discovered in the same year (1976). The URL shows that some of the routines are used today in their original form.
"Solutions as business"
"Solutions as business"
Highlights
Work |
||
Developed a prototype system in Lisp for automatic fraud detection based on multivariate statistical techniques. |
||
Developed a multivariate regression model. |
||
Developed and implemented a new, optimal algorithm for monotone regression. |