The SDSS (York et al. 2000) consists of four major components: a dedicated 2.5m telescope at Apache Point, New Mexico, along with a separate 50cm telescope used to monitor the extinction and to provide calibration patches for the main telescope; a large format imaging camera (Gunn et al. 1998) containing 30 (1313 arcmin) photometric CCDs with filters and 24 astrometric and focus CCDs; two 320-fibre-fed double spectrographs, each with two CCDs; and lots and lots of software, with contributions from most of the SDSS institutions (listed in the acknowledgments).
The primary goals of the project are to survey the Northern Galactic Cap ( square degrees) in five bands to (PSF) limits of 22.3(u'), 23.3(g'), 23.1(r'), 22.3(i'), and 20.8(z'), and to carry out a spectroscopic survey of galaxies, QSOs, and a few stars.
The SDSS is now in operational mode, and as of this writing (late January 2001) has imaged some 1600 and obtained about 120,000 spectra as part of its commissioning and initial operations phases. These data have allowed dramatic new astronomical discoveries to be made, discoveries that we shall not further discuss here (e.g., Blanton et al. 2001; Fan et al. 2000, 2001; Fischer et al. 2000; Ivezi et al. 2000; Leggett et al. 2000).
The SDSS took an early decision to use public domain software wherever possible; in practice this has largely been applied to our infrastructure rather than scientific codes.
Our software engineering tools are entirely public domain (with the exception of compilers).
We adopted cvs as a source code manager and have been pleased with its performance. We currently have about 1.7GB in our cvs repository (including at least one version of IRAF). We have found that, after an initial period of distrust, scientists have found cvs to be extremely useful; in at least some cases, people sitting next to each other at the observatory in New Mexico have communicated via a cvs repository in Illinois.
While cvs allows us to control individual pieces of software, it does not provide a means of controlling complete systems. We have used a Fermi National Accelerator Laboratory (FNAL) utility called ups which allows us to associate a set of dependent products with a piece of our software. For example, version v5_2_10 of the image processing pipeline depends upon v7_15 of our infrastructure routines. This enables us to guarantee that at any time in the future we can reconstruct an entire system, using exactly the same bits and pieces. The particular versions (e.g., v5_2_10) correspond to tags in the cvs repository. We have adopted a procedure that stable versions of our pipelines correspond to branch tags in cvs; this has allowed us to proceed with development while giving us the ability to fix bugs found in the stable, delivered, code.
We have used gnats as our problem report and bug database. Since July 1998 we have acquired 1799 entries in the database; the last thousand have been filed since February 2000.
We use a heavily enhanced version of TCL 7.4 (actually, of TCLX) as our command language. Much of the work developing this system (known as dervish, née shiva, Sergey et al. 1996) was carried out at FNAL.
In addition to what now appear to be basically cosmetic changes (which we regret), the major enhancements that we made were:
saoimage We gratefully adopted saoimage, and added a facility to overlay bitmasks.
If we were starting this problem today, we would probably not use TCL (maybe python in its PyRAF incarnation?), and we would certainly make greater efforts to use vanilla, up-to-date, versions of our chosen system.
The SDSS has quite a large number of pipelines which must be run in order to fully process the data; we shall not discuss the spectroscopic reductions or the operational and scientific databases.
One major gain from splitting responsibilities in this way is that once we get to the frames pipeline, fields ( patches on the sky) may be processed independently and in any order.
The SDSS imaging pipelines employ a number of novel, and even interesting algorithms, which are slowly being written up for publication; for example, the image deblender (Lupton 2001). Here we shall only discuss a couple of features connected to handling the point spread function (PSF) and the related problem of star/galaxy separation.
Even in the absence of atmospheric inhomogeneities the SDSS telescope delivers images whose FWHMs vary by up to 15% from one side of a CCD to the other; the worst effects are seen in the chips furthest from the optical axis.
If the seeing were constant in time one might hope to understand these effects ab initio, but when coupled with time-variable seeing the delivered image quality is a complex two-dimensional function and we chose to model it heuristically using a Karhunen-Loève (KL) transform.
We also need to accurately know the PSF in order to be able to separate stars from galaxies; after all, the only valid discriminant that isn't based on colours or priors is that galaxies don't look like stars.
A good knowledge of the local PSF is also needed for all studies that measure the shapes of non-stellar objects (e.g., weak lensing studies, Fischer et al. 2000).
The first step is to identify a set of reasonably bright,
reasonably isolated stars from our image.
We then use
these stars to form a KL basis, retaining the first
terms of the expansion:
Once we know the we can write
In fact, rather than use only the stars from a single frame to determine that frame's PSF, we include stars from both preceeding and succeeding frames in the fit. This has the advantage that the spatial variation is better constrained at the leading and trailing edges of the frame; that the PSF variation is smoother from frame to frame; and that we have more stars available to determine the PSF.
We have found that optimal results are obtained by using a range of frames to determine the KL basis functions and frame to follow the spatial variation of the PSF. If we try to use a larger window we find that variation of the coefficients is not well described by the polynomials that we have assumed. We have not tried using a different set of expansion functions (e.g., a Fourier series).
We fit three models to every object, in every band: a PSF, a pure deVaucouleurs profile, and an exponential disk; the galaxy models are convolved with the local PSF (as estimated using the KL expansion of the previous section). This is potentially an expensive operation as it involves a 3-dimensional ( ) non-linear minimisation; each iteration requires the calculation of a 2-D analytical model of a galaxy followed by convolution with the PSF and the calculation of by summing over many pixels of the image. We make heavy use of pre-calculated tables of models, and pre-extract the radial profile into a series of annuli, each containing twelve 30 sectors; in consequence fitting a single galaxy model in a single band takes of order 1.5ms on an 800MHz alpha.
The primary use of these models is in star/galaxy separation and morphological classification of galaxies. We initially hoped to use the relative likelihoods of the PSF and galaxy fits to separate stars from galaxies, but found that the stellar likelihoods were tiny for bright stars, where the photon noise in the profiles is small, due to the influence of slight errors in modelling the PSF. Instead we found the ratio of the flux in the best-fit galaxy model to that in the PSF to be an excellent discriminant.
Figure 2 shows a colour-magnitude diagram from a small area of SDSS imaging data. The top left panel shows only objects classified as stars; note that most objects with colours of are preferentially classified as galaxies. The star/galaxy separation is independent of the object's colours, so this rejection must be a measure of how well the star/galaxy classification is working.
Studies of the performance of the SDSS S/G separation in the Groth strip data (where accurate classification is available from HST imaging) indicate that separation is reliable to at least a of 21.5 in data that has a limit of .
The colour of galaxies is a good discriminant of Hubble type (Strateva et al. 2001). Figure 4 shows plotted against what is essentially the likelihood ratio for deVaucouleurs and exponential models shows that the galaxy likelihoods provide clear morphological classification to , in data with a PSF limit of about 22.5.
As far as he knows, this section represents the views only of the primary author and not of his coauthors. Those of you who know him will have heard these opinions before.
The SDSS has been very challenging technically, scientifically, and managerially. In all categories the software stands out: The hardest technical aspect of building the SDSS was probably the software, although building the mosaic camera wasn't easy; some of the software was a major scientific challenge; and the software was undoubtedly the hardest part of the project to manage.
Let me expand upon some of these issues. We have found it extremely hard to hire good people to work on astronomical software. There is no career path within the universities for software specialists, despite the fact that there's no logical distinction between building hard- and soft-ware instruments. Smart and sensible graduate students, desirous of a career in astronomy, simply don't choose to specialise in the software required to reduce modern observational datasets.
Hiring computer professionals is not the solution to this problem. Besides being (if competent) too expensive for the average astronomical project, they simply don't possess the skills needed to solve the scientific challenges posed by astronomical data. We need scientists to resolve scientific problems, albeit with support from people whose job it is to know about optimizers, LALR(1) grammars, and good software engineering practices. We also need our software-scientists to be in rich scientific environments, where they can talk with (say) the quasar-scientists about the data analysis that they are carrying out.
If we, as a community, knew how to reuse software from one project on another some of these problems might be alleviated, but I don't believe that they would go away. The availability of good numerical libraries hasn't made the development of new cosmological codes stop; the impetus for change comes from the desire to do things better, not just from the not-invented-here syndrome.
I believe that part of the problem is that we, as a community have not yet faced the reality that software is difficult, and that the dynamic range between the really good and the average programmer is as great as that between Lyman Spitzer and the average graduate student. This makes management difficult; imagine trying to get a collaboration of 100 self-opinionated astronomers to agree about the best way to solve a problem, and tell me why this is any easier than running a large modern collaboration involving large amounts of software. I reluctantly believe that we must learn to run large software projects (and all large projects nowadays are large software projects) as benevolent dictatorships--of course with the implicit hope that I shall be the dictator (but not the manager).
The Sloan Digital Sky Survey (SDSS) is a joint project of The University of Chicago, Fermilab, the Institute for Advanced Study, the Japan Participation Group, The Johns Hopkins University, the Max-Planck-Institute for Astronomy, New Mexico State University, Princeton University, the United States Naval Observatory, and the University of Washington. Apache Point Observatory, site of the SDSS telescopes, is operated by the Astrophysical Research Consortium (ARC).
Funding for the project has been provided by the Alfred P. Sloan Foundation, the SDSS member institutions, the National Aeronautics and Space Administration, the National Science Foundation, the U.S. Department of Energy, Monbusho, and the Max Planck Society.
The SDSS Web site is http://www.sdss.org/.
Blanton, M., et al. 2001, The Luminosity Function of Galaxies in SDSS Commissioning Data, submitted to AJ
Fan, X., et al. 2000, AJ, 120, 1167
Fan, X., et al. 2001, AJ, 121, 54
Fischer, P., et al. 2000, AJ, 120, 1198
Gunn, J. E., et al. 1988, AJ, 116, 3040
Ivezi Z., et al. 2000, AJ, 120, 963
Leggett, S. K., et al. 2000, ApJ, 536, L35
Lupton, R. H., et al. 2001, SDSS Image Processing I: The Deblender AJ, submitted
Sergey, G., et al. 1996, in ASP Conf. Ser., Vol. 101, Astronomical Data Analysis Software and Systems V, ed. G. H. Jacoby & J. Barnes (San Francisco: ASP), 248
Strateva I., et al. 2001, in preparation
York, D. G., et al. 2000, AJ, 120, 1579