AstroBrowse Workshop II
13/14 December 1995
Space Telescope Science Institute

Astro Profile for Search and Retrieval

Version 0.3
Draft

A/WWW Enterprises
Archibald Warnock
785 Paul Birch Drive
Crownsville, MD 21032

Phone/FAX 410-923-1009
E-mail: warnock@awcubed.com

Introduction

The purpose of this document is to describe the common syntax and terminology used to specify a search to any of the participating AstroBrowse Data Centers. It is being developed in conjunction with the series of AstroBrowse workshops, sponsored by NASA.

The AstroBrowse participants have reached consensus on a number of issues related to establishing a search profile for astronomical data, and are now proceeding with a basic level of implementation.

For the time being, we envision that the search and retrieval syntax will be encoded in http protocol streams, derived directly from HTML forms. The syntactic content, however, is not intended to be specific to the http protocol. It might be desirable in the future to map the syntax to other protocols.

In general, to specify a search, a number of items ("attributes") must be specified or assumed. The end user specifies a search term. The search system must know how that term is to be processed - the term is processed against the contents of a database field ("Use Attribute") and a comparison operator is specified ("Relation Attribute"). Other protocols define additional attributes (position, structure, truncation and completeness), but these are not deemed necessary at this time for the Astro profile.

The ASTRO Search Profile

Table 1 contains a list of valid values for Use Attributes. These attributes map to searchable fields in many, if not all, existing on-line data systems, and appear to define a minimal set of desirable search criteria. Table 2 contains a list of valid Relation Attributes. These define the way in which the user's search term and the values in the database are to be compared.

Both of these fields are encoded as numeric values to avoid possible ambiguities in spelling, case sensitivity and punctuation.

Table 1
Minimal Use Attributes
Use/Field Value
Name 1
RA 2
Dec 3
Radius 4
Data Class(*) 5
Data Type(*) 6
Bandpass(*) 7
Time 8
Observatory/Mission/Project(*) 9
Equinox(*) 10
(*) Fixed value fields (controlled vocabularies). Others are free text.
Table 1a
Extended Use Attributes
Use/Field Value
RA-min 100
RA-max 101
Dec-min 102
Dec-max 103
Time-min 104
Time-max 105
Temporal Resolution 106
Spatial Resolution 107
Wavelength Resolution 108
Name Resolver 109
Table 2
Relation Attributes
Relation Value
Less Than 1
Less than or equal 2
Equals 3 (default)
Greater Than 4
Greater than or equal 5
Not equal 6
inside 7

Once these attributes have been defined, it is possible to formulate a basic search from an HTML form.

Example 1: Search for any information on 3C273
The http stream would look like:

term1=3C273&use1=1&rel1=3

Note that this allows for multiple terms to be specified.

Example 2: Search for UV flux observations of OJ287
The http stream would look like:

term1=OJ287&use1=1&rel1=3&term2=UV&use2=7&rel2=3&term3=flux&use3=6&rel3=3

Additionally, we might want to specify ranges of data.

Example 3: Search for any flux data taken at KPNO in June, 1987
The http stream would look like:

term1=flux&use1=6&rel1=3&term2=01-Jun-1987&use2=8&rel2=5&term3=30-Jun-1987&use3=8&rel3=2

Finally, we can provide lists of items to search within a single use attribute.

Example 4: Search for any information on 3C273 or OJ287 or M31
The http stream would look like:

term1=3C273+OJ287+M31&use1=1&rel1=3

In addition to specifying the search terms, use attributes and relation attributes, there are a number of user parameters and options which can be defined to govern the interaction.

Element sets allow the user to request certain kinds of responses from the server. At least two element set names should be defined - Brief ("B") and Full ("F"), although only brief records must be supported. Brief records are generally used to transmit a short identifier to the user to give them some idea of the contents of the full record. The brief record for a document, for example, would likely be the title. For a data catalog, it might be the name of the catalog. In general, the contents of the brief record is site-specific. Data providers return what they deem relevant.

The user would request the full record when they wanted, for example, to retrieve actual data. From an archive of data in FITS format, the request ESN=B might return some minimal information from the FITS header, while a request for ESN=F might return the actual FITS file. Other element sets can be defined to be combinations of use attributes, for example, but such definitions are beyond the current scope. We use the characters "B" and "F" instead of numbers because single characters might be unambiguous enough to be useful, but numbers could certainly be used instead - TBD.

Table 3
Element Set Names
Value Description
B Minimal (brief) descriptive information about the item (default)
F Verbose description of the item, or the item itself

Example 5: Search for catalog entries above 80N, and return the actual catalog entries
The http stream would look like:

term1=catalog&use1=5&rel1=3&term2=80N&use2=3&rel2=5&ESN=F

Although most interactions will initially be through World Wide Web, there may be instances in the future where the user wants the results returned in some format other that HTML. This can be negotiated, or ignored, when not relevant or not supported at the server.

Table 4
Preferred Record Syntaxes
Name Value Description
HTML 1 Hypertext Markup Language (default)
Text 2 Plain ASCII text
Generic 3 Tagged, keyword=value or something parsable

It is expected that participating data systems will at least support the defaults values. That is, all should support requests to present brief records in HTML. Other requests can be declined or options substituted (that is, a request for tagged records could either be filled, if it's supported, declined/refused, or HTML could be returned instead).

Example 6: Search for catalog entries above 80N, and return the actual catalog entries in the generic tagged format (specifics TBD)
The http stream would look like:

term1=catalog&use1=5&rel1=3&term2=80N&use2=3&rel2=5&ESN=F&PRS=3

Finally, users have a number of other parameters to specify. Table 5 contains an incomplete list of them:

Table 5
User Options
Name Value(s) Description
NoField 1 Ignore - ignore any unsupported use attributes and process the query as if they were not specified (default)
2 Abort - abort of any of the use attributes are not supported
StartRecNum numeric (default=1)
NumRecsReq numeric (default=all)
DCName AB server names (list) (default=all)
Nameserver 1 None - no name resolution, pass the name directly to the data system (default)
2 SIMBAD - resolve the specified name into RA,DEC via SIMBAD
3 NED - resolve the specified name RA,DEC via NED
ABver 1 specify what version of the profile is being used for defining terms.

Example 7: Search for any image data of M31 in the HST and HEASARC archives. Resolve the name at SIMBAD, abort if the server is unable to search on Data Type, and return no more than 20 records.
The http stream would look like:

ABver=1&term1=M31&use1=1&rel1=3&NoField=1&NumRecsReq=20&Nameserver=2&DCName=HST&DCName=HEASARC

Extended Services

In addition to being able to submit a query to the various data center servers, a number of additional services have been identified as being desirable.

Name resolution
We have defined a user parameter which specifies a name resolver. The intention is to submit the name of an object to one of the existing services like SIMBAD or NED, and to retrieve a "standard" position in return - the idea being that it is better to search data catalogs by position than to search by object name.
Gateway/exploder Services
In order to distribute a query to more than one data center server, some sort of gateway script is required. This gateway can also provide a number of enhanced functions which can reduce the complexity of the individual data center servers. For example, the gateway could be written to handle the conversion of coordinates from a variety of formats into a canonical format, thereby eliminating the need for each data center server to do that conversion.
Resource Discovery
Each data center server should be able to respond to a generic "explain" query by providing some descriptive or summary information about holdings. The syntax to be used is as-yet undefined, but might look something like

ABver=1&EXPLAIN&PRS=1

One could conceivably also provide this type of descriptive information about a single holding or catalog at the site with a query like

ABver=1&EXPLAIN&DBName=&PRS=1

Outstanding Issues

I think making HTML the default for PRS is a mistake - it should be plain text, since that's simplest to construct.
We need to define canonical forms and reference points for dates and times, positions and radius.

Table Notes

Table A1 Notes: We need someone to write descriptions of the data classes, and suggestions of other data classes which one or more data center wish to support for searching. Same for Table A2.
Table A3 Notes: We need descriptions for the bandpasses. Also, we refer in the 1st Workshop Report to various types of X-ray and UV data. Should those be distinct values or not?
Table A4 Notes: We need to fill out this for allowed values. Shall we just (initially) include the participants in building the prototype, or should we try to make the list comprehensive?

Summary

This document is intended to be dynamic, and to reflect the current practice and needs of the astronomical community. Updates, additions, suggestions and corrections may be submitted to the principal author principal author, or to the AstroBrowse List Server.

In particular, additions to the controlled vocabulary values are needed, as are descriptions of the various elements in the tables.

Archie Warnock
A/WWW Enterprises

Appendix - Controlled Vocabulary Attribute Values

Table A1
Data Class (Use Attribute 5)
Name Value Description
Pointed Observation 1  
Catalog 2  
Survey 3  
Reference Data 4  
Simulation 5  
Table A2
Data Type (Use Attribute 6)
Name Value Description
Image 1  
Spectrum 2  
Time Series 3  
Flux measurement
(photometric)
4  
Visibility Data 5  
Table A3
Bandpass (Use Attribute 7)
Name Value Description
Gamma-ray 1  
X-ray 2  
Ultraviolet 3  
Optical 4  
Infrared 5  
Millimeter 6  
Radio 7  
Table A4
Observatory/Mission/Project
(Use Attribute 9)
Name Value Description
Space Telescope 1  
Kitt Peak National Observatory 2  
Cerro Tololo InterAmerican Observatory 3  
National Radio Astronomy Observatory 4  
Table A5
Equinox (Use Attribute 10)
Name Value Description
J2000 1  
B1950 2  
B1855 3