(O11.4) 20 Spatial Queries for an Astronomer's Bench (mark)

Maria Nieto-Santisteban (Johns Hopkins University)
Tobias Scholl (Technische Universität München)
Alfons Kemper (Technische Universität München)
Alexander Szalay (Johns Hopkins University)

The astronomy community has put tremendous efforts to provide global access to their distributed scientific datasets. Motivated by the expected data rates of several Terabytes a day and Petabytes a year from up-coming
projects such as the Panoramic Survey Telescope and Rapid Response System (Pan-STARRS) and the Large Synoptic Survey Telescope (LSST), we are designing a benchmark for data-intensive astronomical workbenches. We specify the workload based on the type and size of the datasets, typical queries (like points-in-region, region-in-region, and cross-match queries), and other data-analysis tasks. We also define quantitative and qualitative metrics and consider different audiences. The proposed benchmark then allows scalability evaluation (How does my system scale?), and system comparison (How does X compare to Y?). Given the variety of astronomy data-intensive applications and use cases, it is unlikely to find a single best solution. By defining a benchmark, we can offer guidance for system designers to develop productive environments and for end-users to evaluate what environment works best for their personal analysis and research.