prev   toc   next  

ADASS XIII presentations

Session O3: Algorithm & Classification


O3.1: Promises and challenges in automatic pattern recognition (Invited)

Tin Kam Ho, Bell Labs, Lucent Technologies.

Pattern recognition seeks to identify and model regularities in empirical data by algorithmic processes. Successful application of the established methods requires good understanding of their behavior and also how well they match a particular context. Difficulties can arise from either the intrinsic complexity of a problem or a mismatch of methods to problems. We describe our efforts in characterizing the intrinsic complexity of a classification problem and its relationship to classifier performance. We discuss how Mirage, an exploratory data analysis tool, is designed to address such difficulties.

O3.2: Algorithms for Statistics on Very Large Datasets

Alexander Gray, Carnegie Mellon University, Andrew Moore, Carnegie Mellon University

Several fundamental statistical inferences, particularly nonparametric ones, have become critical tools for scientific data analysis yet do not scale tractably to modern large datasets, where the number of data is in the tens of thousands and beyond. I will describe very recent tree-based algorithms which have dramatically reduced the computational complexity of 1) kernel density estimation (which also extends to nonparametric regression, classification, and clustering), and 2) the n-point correlation function for arbitrary n. These problems typify a larger class I call 'generalized N-body problems', and these new algorithms typify a new class of N-body solver which can treat many statistical problems for the first time, unlike existing solvers for physical N-body problems, and typically yield runtimes of seconds for millions of data points on desktop PC's. Downloadable software is available.

O3.3: Source Detection with Bayesian Inference on ROSAT All-Sky Survey Data Sample

Fabrizia Guglielmetti, Max-Planck-Inst. fuer Extraterrestrische Physik (MPE)/ Plasmaphysik (IPP) (Garching, Germany), Wolfgang Voges, MPE, Rainer Fischer, IPP, Guenter Boese, MPE, Volker Dose, IPP

We are proposing a statistical method for source detection with the aim of discovering faint celestial objects, point-like or extended. Our data sample was collected during the ROSAT All-Sky Survey by the Positional Sensitivity Proportional Counter on board of the ROSAT satellite. The sample comprises about 125,000 X-ray sources. 95% of these sources have been detected by the Standard Analysis Software System,mainly designed for the detection of point-like sources and for the determination of their astronomical parameters. Weak and extended sources were either not automatically detected or were not satisfactorily analyzed and parameterized. We are aiming at detecting faint and extended sources to access the 5% remaining sample with the goal to study X-ray emitting objects like clusters of galaxies, groups of galaxies, AGNs, QSOs and the diffuse X-ray emission. The presented method classifies pixel cells (45x45 arcsec2) or pixel domains, occupied by a certain number of photons, with a probability of consisting either only background photons or having additional source contribution. The presence of a background and the sources is modelled by a mixture of two components. The background component represents the diffuse X-ray and the intrinsic instrumental background. The gradually varying background is captured by a bivariate Thin-Plate spline. The source component describes the sources which appears as local count enhancements. Each pixel cell (or domain) is characterized by the probability of belonging to one of the two mixture components. For the estimation of the background spline all the photons contained in every pixel cell are considered with their proper weighting according to the probability of belonging to the background. The ROSAT exposure map and its fluctuations have been incorporated for the background evaluation, as well. The observatory's point spread function has been taken into consideration for the source detection. The method is a one-step approach; background estimation and source estimation are performed simultaneously. The estimation is performed in the framework of Bayesian Probability Theory, which provides a unique method of dealing with noisy or incomplete data and uncertainties in models, and for combining information of various types in one concise algorithm. Results of this method will be presented with simulated and ROSAT datasets.

prev   toc   next