
SOME GUIDELINES FOR DEVELOPERS USING SDB
========================================


The Scientific Data Browser (SDB) is a
Common Gateway Interface (CGI) program written
in C that reads files in different formats and displays the
contents of that file in HTML so that it can be read via the
Web.

Currently the SDB reads HDF and netCDF file formats. There are
also two other versions of the code, one reads FITS, the other
reads Common Data Format (CDF).
The code may be redesigned in the future such that it will
be relatively easy for developers to customize the SDB
so that it can read and display files in their own file formats.
The purpose of this document is to explain the structure
of the SDB to developers who want to expand it's capabilities to
read new different file formats and display them via HTML.

Here are the SDB directories and a description
of the function of the code within each directory:


MAIN
****

Main contains only main.c and the toplevel makefile
for the SDB.  To build the SDB, a user should only
have to change directories into the main directory and
type "make". Upon sucessful build an executable "sdb"
resides in the main directory.

main.c is the top-level C file of the SDB, the routines in main
are always called with each new request to the SDB CGI program.
Keep in mind that, as with any CGI program, one browse session
is typically composed of MANY new calls to the CGI program,
one for every new form displayed.  With each new form, main.c
is called.

main.c embodies the functions necessary for any Common Gateway
Interface program: it receives the user's request via the
Web server, reads the environment variables available via
the CGI mechanism and processes and parses the data sent
in the QUERYSTRING environment variable, which is data
sent from the user, that defines the user's request.
Typically this is information such as the 
name of the file to be viewed, etc.

Since the Web is a stateless protocol, each new user request 
passed to a CGI program, like the SDB,  must contain all
all the data and information needed to process that request.
In the SDB all data that is passed between separate calls to CGI
is stored in the lvars data structure, in sdb.h. When main
processes and parsers a users request, or routines below main
process information about the file, this information is
stored, mostly in linked lists, in the lvars datastructure.

main.c determines the type of the file to be viewed and invokes
the file specific high level routines that satisfy the
user's command. These file-specific routines are currently in sdbhdf/*
and build the libsdbhdf.a library.


SDBUTIL
*******

The code in this directory builds libsdbutil.a, a library of
general utilities that are NOT dependent in any way on filetype
specific routines.  This library does not import any HDF header
files or use any other file-format specific constructs.
This library, can and should, be used my new SDB libraries that
read new file formats.
 
This directory contains several general types of functions:

==>Generic Linked List Manipulation Functions 

(glist.c )  Generic linked lists are used for storing data
about the file. These linked lists are stored in the lvars struct.
For example, libsdbhdf.a stores all the data about the raster images within
a given HDF in one generic linked list, contained in lvars.  If a HDF file
has 3 raster images, then that linked list contains three elements.
A common linked list function in glist.c that is used by libsdbhdf.a is 
perform_on_list, which takes as an argument a function pointer.
This routine will apply that function to each element in the generic linked
list.  If you pass perform_on_list() a function to
"print raster images and display them as HTML" then perform_on_list
will apply that function to each element in the linked list of raster
image elements.  This is the fundamental way that
the  SDB works for each object type it supports.
For HDF, there are linked lists of all
the HDF object types, raster images, palettes, scientific data
sets, annotations, etc.

New file types should first classify the types of object supported by
that file type and then define a data structure that describes that
object.  This data structure is an element in a linked list.
Then functions should be developed which present this element
in HTML form.


==> Routines for Handling GIF Encoding of Images:
( flgifc.c, flgife.c, flgifw.c )
These routines are used in taking subsetted data of images
within a file, and creating a thumbnail browse image in GIF
for display by the Web browser.


==> Routines for String Parsing, Manipulation:
( myutil.c, util.c)

==> Routines for Presenting the Data in HTML Format:
( html_util.c, htmlstrings.h) 

html_util.c contains routines that will print a
labels with the appropriate HTML tags, print
data from a linked list as a HTML radio button
checkbox, print data from a linked list into
a HTML "selector" , print data from a linked list
into a HTML table.  To perform any new HTML output function
, new libraries should use routines in html_util.c,
or add new functions to html_util.c, rather than
hardcoding HTML tags directly into fprintf()
( printing to standard out) within the new library.
For a quick dirty hack, that you will curse yourself
later down the road for doing, fprintf to standard
out anywhere in the code will work.  Again, take our
word for it, we have been there - use a HTML presentation
routine to print data in HTML form.

One advantage to this approach is that if you ever
have to print output in any other format ( like to
a Java applet, or in ASCII), the mechanism
for printing the data already exists, you only have
to copy the HTML presentation routine to a new procedure,
change the output format tags ( <H2>, <UL>, etc) and
invoke the new presentation routine.

Another advantage is EASY CUSTOMIZATION.
If you must change a label tag, or table header, you change
that label in one place, not in N number of places embedded
in the code.  Currently all labels, table headers, table
labels, page headers, are in the file htmlstrings.h
If you want to change any of these words, simply edit this
file, recompile and reinstall, and the change will be made
globally.


SDBHDF
******
This is the main library for reading HDF and netCDF, extracting
the data about the objects within the file, and displaying that
data in HTML form.  
As described in the HDF library documentation, all data object
types are referenced within the HDF file by a Tag.

The file sdb.c is at the topmost level, it initializes the
lvars structure, and contains the topmost routine, sdbGrokFile.
The function sdbGrokFile() will take a whole file analyze the HDF
tags in the file and return a view of all the toplevel objects
within the file.
(i.e. all datasets, all raster images, vgroups, vdata, etc).
Essentially this functions calls functions in the files
listed below, which build the generic linked lists which describe
the object types within that file.  These linked lists are
pointer to by the lvars struct and passed around between
invocations of the CGI program.

These HDF object type routines are:
raster images (sdb_ri.c)
palettes (sdb_pal.c)
scientific data sets ( sdb_sds.c, sdsdump.c, sdssub.c)
vgroups (sdb_vg.c)
vdata (sdb_vd.c, show.c, vdsub.c)a
attributes (sdb_attr.c)
The above files contain "iterator functions" which are
submitted to perform_on_list, to print the structures
to HTML,  add elements to that data types linked list,
free the linked list, print the data element as an
entry in a HTML table.

The sdbhdf directory also contains data files to print vdata to
an xyplot ( plot.c , plotgif.c)

and some utility programs that are specific to HDF. For example,
they use HDF number types:
hdfutil.c, sdb_util.c, sdb_img.c 
The routine hdf_print() in hdfutil.c is a typical utility
program. It prints and formats a HDF number, depending on
it's number type.

If a new file type is to be supported by the SDB, a new library
like libsdbhdf.a should be added that performs functions similiar to those
in sdbhdf/*.c , but are ,of course,
dependant on the existing functions, data structures, and capabilities
of the new filetype. Note that one SHOULD NOT simply
add code to the sdbhdf/*.c routines to support the new filetype.
This "non-modular"development strategy, works for 
short term objectives (quick "hacks"), but has proven to lead
to unmaintainable, difficult-to-support code.  Please take our
advice and BUILD A NEW MODULE, you will be happy you did.



Important Note about the handling of State in the SDB
+++++++++++++++++++++++++++++++++++++++++++++++++++++

The SDB has a finite number of states.  Given that the SDB
is called by a stateless protocol, HTTP, each time the
program is called it must have all the information it
needs to process the request.  One piece of needed information
is "where was I last, which state was I in, when I last
stopped processing?" The state you were last in corresponds
to the HTML form that should be displayed next.  Currently
the states in the SDB are represented in an ad-hoc manner
with the user of the variable , display_mode.
A query condition if ( display_mode = 2) is a query
as to which state the sdb was in when last exited.

For example, to display scientific dataset data you may want
to display the data in a variety of forms, depending
on the last state you were in when you submitted
a new Web request and recalled the SDB program:

 The three display modes are:
  (1) display the scroll boxes containing the
      names of scientific datasets in a file,
 (2) display a form with three fields, namely,
     "start", "stride", and "end",
 (3) display data in the dataset. 

This is currently a very adhoc model and I am unhappy
with it! It is  not a general enough
model for supporting many file format libraries.
Also main.c still has many queries and statement
specific to the HDF file format.  We will be
generalizing the SDB and fixing these problems in the
next revision.
