One of the fundamental goals of the Virtual Observatory is to enable astronomical services to be used cooperatively to answer potentially very complex scientific questions. The idea is that these services would build on top of each other, similar to the packages in IRAF. The core services provide access to astronomical data and publish certain basic tools, while the higher level services would perform more sophisticated scientific analyzes relying on results from the core services. The hierarchy of services would provide a standard interface to all public astronomical resources.
XML Web Services is an industry standard (W3C) that perfectly suits the needs of the astronomical community. It is built on other standards such as XML, XSD, SOAP and WSDL, which guarantees interoperability between platforms and makes it independent from programming languages. Anyone can implement and consume Web Services using practically any computer and language. Currently there exist two fully functional development environments for Java and the .NET Framework that make programming Web Services very easy.
Here we describe a prototype distributed query system for the VO, a hierarchy of Web Services that federates astronomical databases possibly located all over the World.
SkyQuery is a network of Web Services. The Portal provides an entry point into the distributed query system relying on the metadata and query services of the database SkyNodes. The SkyNodes are the individual databases located at different sites along with their WS wrappers. At present, there are 3 SkyNodes linked into SkyQuery: (1) SDSS, (2) 2MASS and (3) FIRST. Having the SkyNodes registered in the Portal, the complexity of the network can be completely hidden from the user, see Figure 1. A sample user interface is implemented as a Web application on the project site that can submit queries, search the metadata and render the XML DataSet into an HTML table. The client web applications also uses the Sloan Digital Sky Survey's Image Cutout web service to display the composite color image of the sky specified in the query.
The primary entry point to SkyQuery is a Web method that accepts a request in an extended SQL format. The slightly modified syntax was required to specify the target archives and area on the sky and to parameterize the probabilistic cross-matching algorithm. Figure 2 shows a sample query.
The data nodes publish functionalities that are consumed programmatically by the portal. These methods provide access to the data and metadata of the archive. Anyone can publish her data through SkyQuery by implementing a few SOAP methods regardless of how the data are stored. In fact, the 3 methods below are currently the only requirements to register a SkyNode:
How does it really work? The Portal receives a request and parses the query. After locating the referenced SkyNodes, it submits a simple SQL query in parallel to every SkyNode using the Query() method to get an estimate for the number density of the objects satisfying the selection criteria. For example, the sample query in Figure 2 is looking for galaxies in the SDSS survey ( o.type=3) matched with objects in the 2MASS that are fainter than magnitude in the band ( t.j_m>14). Based on the results, the portal arranges the SkyNode into an execution plan so that running the distributed query would minimize the network traffic. The portal then just executes the plan by calling the XMatch() method of the first SkyNode in the ``stack'' and waits for the results to come back from the SkyNodes that can be just relayed back to the user.
The first SkyNode ( SkyNode 1 in Figure 3) looks at the plan and decides if it has the information to satisfy the request. If not, it then recursively calls the next SkyNode with a simpler execution plan and so on until the last node in the plan ( SkyNode 3) will see a simple SQL query that it can run against the local database. These requests are done by passing only very light-weight objects on the wire but now real data start streaming from one SkyNode to another. Having received the data from the bottom data node, SkyNode 2 can do its job: first it matches the catalogs using the astrometric precisions and probabilistic thresholds, then applies the selection criteria and returns the data back to one level up. All that is carried out within the database. Only the necessary parameters are propagated that were selected by the user or that are needed to perform the cross identification of the catalogs. The mixed constraints, e.g., (o.i-t.j_m)<1, are applied as soon as they can be evaluated. The result from the top level SkyNode is sent to the user.
For extra credit, the system can calculate the best positions of objects based on the positions measured by the individual surveys and it can also quote a probability on the match-up. The web application at the project site automatically adds the columns of these parameters to the result table.
The observations of large all-sky surveys are stored in separate databases and due to the rapidly changing technology these data sets cannot be built into a centralized system. SkyQuery can federate databases using XML Web Services. It can cross-match many catalogs on the fly using a probabilistic fuzzy join or it can look for drop-outs in certain catalogs. SkyQuery has proven that astronomical services may be adequately implemented as Web Services.
SkyQuery is a work in progress. The planned enhancements to the prototype demonstrated here include support for complex area specification and an advanced query language that is more flexible, e.g., allows local table joins. Next, survey footprint services will be added to the SkyNodes and the dynamical SkyNode registration to the Portal.
Additional SkyNodes will be added soon. A new SkyNode is on its way at the Institute of Astronomy in Cambridge, UK, to publish the Wide Field Survey catalog of the Isaac Newton Telescope.
This work is supported partly by a NASA AISRP 2001 grant NRA-00-01-AISR-035.
Szalay, A. S., & Gray, J. 2001, Science, 293, 2037
Szalay, A. S. et al. 2002, Proc. of SPIE, 4846, in press
Kunszt, P. Z., Szalay, A. S., & Thakar, A. 2001, `The Hierarchical Triangular Mesh' in Mining the Sky: Proc. of the MPA/ESO/MPE workshop, Garching, A. J. Banday, S. Zaroubi, M. Bartelmann (eds.), (Springer-Verlag Berlin Heidelberg), 631