Next: Designing a Data Model for the Virtual Observatory
Up: Algorithms & Classification
Previous: Spectral Data Models for the Virtual Observatory
Table of Contents - Subject Index - Author Index - Search - PS reprint - PDF reprint

Hanisch, R., Greene, G., Linde, A., Plante, R., Richards, A. M. S., Auden, E., Noddle, K. T., & O'Mullane, W. 2003, in ASP Conf. Ser., Vol. 314 Astronomical Data Analysis Software and Systems XIII, eds. F. Ochsenbein, M. Allen, & D. Egret (San Francisco: ASP), 273

Resource Metadata for the Virtual Observatory

R. J. Hanisch, G. Greene
Space Telescope Science Institute

A. E. Linde, K. T. Noddle
University of Leicester

R. L. Plante
National Center for Supercomputing Applications, University of Illinois Urbana-Champaign

A. M. S. Richards
Jodrell Bank Observatory

E. C. Auden
Mullard Space Science Laboratory

W. O'Mullane
The Johns Hopkins University

Abstract:

The location and access methods of astronomical resources (catalogs, observation logs, and data archives) and associated computational services (e.g., data processing pipelines, source extraction services, theoretical simulations) in the Virtual Observatory will be determined by querying dynamic resource registries. These registries function as a sort of yellow-pages, providing descriptive information (metadata) about the resources in order to locate information and services in response to user queries. The metadata also needs to describe the provenance of the information, provide some indication of the data quality, quantity, and type, and guide users to information appropriate to their needs (i.e., research-oriented data archives vs. educational resources).

1. The Role of Metadata in the Virtual Observatory

In order to make it easy for astronomy information services to participate in the VO, we propose a system for metadata management based on a hierarchy of descriptive schemas. At the top level we require a minimum amount of information, sufficient primarily to note the existence of a resource and to describe who is responsible for it. At lower levels, the metadata are more extensive and complex, allowing for the description of query syntax, access protocols, and usage policies.

A resource is a general term referring to any VO entity that can be described and which can be given a name and unique identifier. Just about anything can be a resource: it can be an abstract idea, such as sky coverage or an instrumental setup, or it can be fairly concrete, like an organization or a data collection. This definition is consistent with its use in the general Web community as ``anything that has an identity'' (Berners-Lee et al. 1998). We expand on this definition by saying that it is also describable.

An organization is a specific type of resource that brings people together to participate in VO applications. Organizations can be hierarchical and range greatly in size and scope. At a high level, an organization could be a university, observatory, or government agency. At a finer level, it could be a specific scientific project, space mission, or individual researcher. A provider is an organization that makes data and/or services available to users over the network.

A service is any VO resource that can be invoked by a user or software agent to perform some action on their behalf. Associated with any service is descriptive metadata about the service. This metadata generally include information the user needs to determine if a service is of interest and how the service may be invoked.  Specific types of metadata are described below. Note that the service itself need not be aware of the metadata that describe it.

A query service supports a query/response protocol. The user submits a query to the service that may define characteristics of interest, and the service returns a set of information to the user. The query may be null, e.g., a current-time service may only support a null query, and some services may respond to a null query with appropriate default actions. Non-query services may also exist, e.g., services to copy or delete files on remote file systems, to mail information to other users, to kill existing jobs, to authorize actions, etc.

A registry is a service which aggregates and serves resource metadata. The metadata may be added to the registry via an input form or harvested from the resources themselves. A registry may serve all resource metadata (full registry), select types of resource (limited registry) or resources at a specific location (local registry). Any registry may also support a query interface which will allow searching for resources based on various combinations of metadata values.

A sample of the metadata that would be used to describe the Sloan Digital Sky Survey source catalog as hosted at the Space Telescope Science Institute is shown in Fig. 1. Further information concerning the encoding of such metadata and their incorporation into resource registries is describe by Plante et al. (2004) and Greene et al. (2004).

Figure 1: Sample resource metadata. Dublin Core elements are shown in bold, and required elements are shown in italics. (Bold italics indicate required elements that are also in the Dublin Core.) See http://dublincore.org for more information about Dublin Core metadata.
\begin{figure}
\par {\tiny\begin{tabular}{p{0.85in}l}
\underline{Identity metada...
... 0.2 \\
Service.MaxReturnRecords & 5000 \\
\end{tabular}}
\par\par\end{figure}

2. Lessons Learned and Questions Raised in Populating a Prototype Registry

Both the NVO and AstroGrid projects have implemented prototype registries. The NVO prototype has been used as a data discovery engine for the Data Inventory Service (http://heasarc.gsfc.nasa.gov/vo/data-inventory.html, McGlynn et al. 2004). The prototype registry was constructed primarily through manual entry of metadata about known cone search and Simple Image Access Protocol services. It took about a week to populate a prototype registry of $\sim$100 resources. During this time period, we recognized certain patterns in data entry as well as inconsistencies in metadata descriptions. This experience leads to the following conclusions and questions:

In addition, the resource metadata concepts described here must be encoded and structured in a machine-readable registries. Work continues on XML schema that more fully show the relationships among metadata elements and that simplify data entry and maintenance efforts (e.g., by allowing an organization to register its curation-related metadata once and apply it to a number of different collections).

References

Berners-Lee, T., Fielding, R., & Masinter, L. 1998, IETF RFC2396, http://asg. web.cmu.edu/rfc/rfc2396.html

Greene, G., O'Mullane, W., Hanisch, R., & Gaffney, N. 2004, this volume, 285

McGlynn, T., Lee, J., Hanisch, R., O'Mullane, W., & Greene, G. 2004, this volume, 319

Plante, R.,Greene, G., Hanisch, R., McGlynn, T., O'Mullane, W., Williams, R., & Williamson, R. 2004, this volume, 585


© Copyright 2004 Astronomical Society of the Pacific, 390 Ashton Avenue, San Francisco, California 94112, USA
Next: Designing a Data Model for the Virtual Observatory
Up: Algorithms & Classification
Previous: Spectral Data Models for the Virtual Observatory
Table of Contents - Subject Index - Author Index - Search - PS reprint - PDF reprint