|
|
|
|
|
|
|
|
Andreas Maunz started his degree in computer science at Freiburg University in October 2001. Since 2004 he has worked for Dr. Christoph Helma and Dr. Andreas Karwath in the Data Mining and Natural Language Processing Labs. Since they were concerned with feature and string mining, respectively, he was assigned tasks such as creating a molecule viewer Java applet for the Lazar web interface (C. Helma. Lazy structure-activity relationships (lazar) for the prediction of rodent carcinogenicity and Salmonella mutagenicity, Molecular Diversity, 2006) and optimizing the string miner FAVST for buffering, necessary for large string databases, such as DNA sequences (S. D. Lee and L. De Raedt. An Efficient Algorithm for Mining String Databases Under Constraints, Lecture Notes in Computer Science, 2004).
In July 2007, he received his M.Sc. in computer science for his quantitative extension to the Lazar algorithm. Consecutively, he was offered a position as Ph.D. student at the Freiburg Center for Data Analysis and Modeling (Prof. Jens Timmer) in the Christoph Helma working group. He participates in the EU Sens-it-iv project concerned with the development of "in vitro” alternatives to animal tests currently used for the risk assessment of potential skin or lung sensitizers (http://www.sens-it-iv.eu).
His main interests are machine learning and data mining of large amounts of data, especially the modeling of multivariate data with pattern recognition techniques including frequency and generality based as well as other statistical filters. His core competence lies in 2D graph based representations providing a good trade-off between expressiveness and computational complexity. Recently, he has proposed a novel kernel describing activity-specific instead of pure structural similarity and a method for automatic applicability domain estimation (A. Maunz and C. Helma. Prediction of chemical toxicity with local support vector regression and activity-specific kernels, SAR and QSAR in Environmental Research, 2008).
|
|
New Lazar Developments and Data Mining Techniques for the Identification of
Structural Alerts
Andreas Maunz and Christoph Helma, Freiburg Center for Data Analysis and Modelling
This talk focuses on the extension of the Lazar system for regression problems and on feature mining techniques for the efficient identification of statistically significant substructures.
We present a novel activity-specific kernel to obtain predictions from a training set with a modified k-nearest-neighbor approach. Endpoints modeled include Fathead Minnow Acute Toxicity, Maximum Recommended Therapeutic Dose and IRIS Lifetime Cancer Risk. The new kernel provides results superior to the well-established Tanimoto kernel and individual predictions are interpretable for toxicological experts without a data mining background.
The identification of structural alerts has a long tradition in toxicological research. While initial approaches relied on expert knowledge alone, recent techniques incorporate data mining techniques together with statistical criteria. We present a novel objective feature mining technique that can be used without prior knowledge of toxicological mechanisms. This makes it especially useful for the investigation of endpoints that are poorly understood and for the generation of hypotheses about toxicological mechanisms.
|
|
|
|
|
|
|
|
|
|
|