Astronomy is entering the era of very large catalogues. Digitizations of photographic plates (USNO, GSC II) or current digital sky surveys (DENIS, 2MASS, Sloan DSS) lead to -- object catalogues. And the number of parameters for each object (positions, magnitudes, errors, flags, etc.) is also considerably increasing.
Gigabyte is the basic size unit for the resulting catalogues, and it becomes impossible for each astronomer to copy locally the full amount of data. Queries over the net, extracting smaller samples with pertaining information, are thus a requisite for such catalogues.
CDS already provides fast on-line access to some very large catalogues which are fully integrated into other CDS services.
When running queries on databases of several hundred million objects, response time becomes critical. A fast access to the database requires that i/o be reduced as far as possible, and thus to have a powerful compression tool to downsize the amount of data.
We use, for each catalogue, a dedicated and optimized compression method to store the data into binary files:
E.g., simply by storing only position offsets, the 6 GB of the USNO A1.0 catalogue (Monet et al. 1997) were compressed down to 3.4 GB (Ochsenbein 1998).
Most of the queries include a selection based on celestial positions, and thus the compressed binary database is indexed on position. Additional filters (on colors or flags) may be applied to the selected sample if necessary.
When a query to such a large catalogue is submitted to VizieR, the CDS database of catalogues and tables (Ochsenbein et al. 2000), a dedicated program handles the request, extracts and decodes the matching data, and sends its output back to VizieR.
An overview of the system is shown in Figure 1.
Table 1 gathers some figures for queries performed on some large catalogues using a Sparc-20 station (72 MHz). On this machine, the complete USNO A1.0 is read in about 40 minutes.
Thanks to the specific tools that have been developed, very large catalogues can be fully integrated in CDS services such as VizieR or Aladin.
For example, one can select in VizieR all the USNO sources around a given target, with additional constraints on the magnitudes. In Aladin, the interactive digitized sky atlas at CDS (Bonnarel et al. 2000), very large catalogues can be overlaid on digitized sky images just like any other catalogue, as illustrated in Figure 2.
The currently (November 1999) available large catalogues include:
For the DENIS project (Epchtein et al. 1999), beyond the access to the point source catalogue, the CDS provides other on-line services, like access to a database of observation strips (quality, number of sources, etc.).
Note that the DENIS and 2MASS catalogues are growing as new data become public.
Thanks to dedicated tools, large-size catalogues can now be handled as easily as the more ``classical'' ones. This will allow further integration of upcoming very large catalogues into CDS services.
Bonnarel, F., Fernique, P., Bienaymé, O. et al. 2000, A&A, in preparation
Epchtein, N., Deul, E., Derriere, S. et al. 1999, A&A, 349, 236
Jenkner, H., Lasker, B. M., Sturch, C. R. et al. 1990, AJ, 99, 2082
Lasker, B. M., Sturch, C. R., McLean, B. J. et al. 1990, AJ, 99, 2019
Monet, D., Bird, A., Canzian, B. et al. 1997, USNO-A1.0, (U.S. Naval Observatory, Washington DC)
Ochsenbein, F. 1998, in ASP Conf. Ser., Vol. 145, Astronomical Data Analysis Software and Systems VII, ed. R. Albrecht, R. N. Hook, & H. A. Bushouse (San Francisco: ASP), 387
Ochsenbein, F., Bauer, P., Genova, F. et al. 2000, A&A, in preparation
Russell, J. L., Lasker, B. M., McLean, B. J. et al. 1990, AJ, 99, 2059
Skrutskie M. F., Schneider S. E., Stiening R. et al. 1997, Proc. Workshop ``The Impact of Large Scale Near-IR Sky Surveys'', 25