Appendix A NCSA HDF Tags Overview This appendix contains a complete description of HDF tags that have been assigned at NCSA as of February, 1989. Most of these tags are represented by two- or three-character uppercase names, such as ND or FID. In the NCSA software, the identifiers that refer to these names are preceded by DFTAG_ and are defined in the files df.h and dfF.h, which are listed in Appendix B. These listings are grouped approximately according to the roles that the tags play under the headings Utility Tags, Raster-8 Tags, General Raster Image Tags, and so forth. These groupings imply a general context for the use of each tag, but are not meant to restrict the use of the tags to any particular context. Utility Tags ND No data 0 bytes (xxx) This tag is used for place holding and to fill empty portions of the data descriptor block. The length and offset fields of a ND DD must be equal to zero. FID File identifier string (100) This tag points to a string which the user wants to associate with this file. The string is intended to be a user-supplied title for the file. FD File descriptor text (101) This tag points to a block of text describing the overall file contents. The text can be any length, in the standard text format for HDF. Text is intended to be user-supplied comments about the file. TID Tag identifier string (102) The data for this tag is a string that identifies the functionality of the tag indicated in the reference number. For example, the tag identifer for tag identifier would point to data that reads "tag identifier". The reference number for the tag identifier is another tag number. Many tags are identified in the HDF specification, so it is usually unnecessary to include their identifiers in the HDF file. But with user-defined tags or special-purpose tags, the only way for a human reader to diagnose what kind of data is stored in a file is to read tag identifiers. Use tag descriptors to define even more detail about your user-defined tags. Note that with this tag you may make use of the user-defined tags to check for consistency. Although two persons may use the same user-defined tag, they probably will not use the same tag identifier. TD Tag descriptor text (103) The data for this tag is a text block which describes in relative detail the functionality and format of the tag which is indicated in the reference number. This tag is mainly intended to be used with user-defined tags and provides a medium for users to exchange files that include human-readable descriptions of the data. It is important to provide everything that a programmer might need to know to read the data from your user-defined tag. At the minimum, you should specify everything you would need to know in order to retrieve your data at a later date if the original program were lost. DIL Data identifier label string (104) The data for this tag is a data identifier, made up of a tag and reference number, followed by a string that the user wants to place in the file. The purpose of this tag is to associate the string with the data identifier as a label for whatever that data identifier points to in turn. With DIL, any data identifier can be labeled. Each data identifier points to some data in the file. By including DILs, you can give any piece of data a label for future reference. For example, DIL is often used to give titles to images. DIA Data identifier annotation text (105) The data for this tag is a data identifier, which is made up of a tag and a reference number, followed by a text block that the user wants to place in the file. Its purpose is to associate the text block with the data identifier as annotation for whatever that data identifier points to in turn. With DIA, any data identifier can have a lengthy, user-written description of why that data is in the file. This will be used to include user comments about images, data sets, source code, and so forth. NT Number type 4 bytes (106) NT consists of four fields of 1 byte each, as shown in Table A.1. Table A.1 Number Type Fields Field Contents VERSION version number of NT information, currently=1 TYPE unsigned int, signed int, unsigned char, char, float, double WIDTH number of bits (assumed all significant) CLASS a generic value, with different interpretations depending on type: floating point, integer, or character Some possible values that may be included for each of the three types in the field CLASS are listed in Table A.2. Table A.2 Number Type Values Type Possible Values floats IEEE floating point, VAX floating point, CRAY floating point ints VAX byte order, Intel byte order, Motorola byte order chars ASCII, EBCDIC The number type flag is used by any other element in the file to indicate specifically what a numeric value looks like. Other tag types should contain a reference number pointer to an NT tag instead of containing their own number type definitions. If an MT element is present in the file, then all NTs can be assumed to be of the appropriate default types for that machine, unless required by definition. The definition of NT has brought up a unique situation where the definition of a generic, always applicable, set of fields is considered too difficult for implementation in the early versions. Therefore, we are defining version 1 of the NT tag which contains only four fields. We have a draft of the version 2 tag definition which starts with the same four fields and adds fields that can define types of numbers not currently listed. Version 1 NT implementations should check the version number field only to confirm it, and must always write a 1 into this field. Version 2 implementations will have to be backward compatible in as many cases as possible. MT Machine type 0 bytes (107) The MT tag specifies that all unconstrained or partially constrained values in this HDF file are of the default type for that hardware. When the MT tag is set to VAX, for example, all integers will be assumed to be in VAX byte order unless specifically defined otherwise with an NT tag. Note that all of the headers and many tags, the whole raster-8 set for example, are defined with bit-wise precision and will not be overidden by the MT setting. For MT, the reference field itself is the encoding of the MT information. The reference field is 16 bits, taken as four groups of four bits, specifying the types for unsigned char, unsigned int, float, and double respectively. This allows 16 generic specifications for each type. To the user, these will be defined constants in df.h, specifying the proper descriptive numbers for Sun, VAX, CRAY, Alliant, and other computer systems. If there is no MT tag in a file, the application may assume that the data in the file has been written on the local machineŅassuming any portability problems are taken care of by the user. For this reason, we recommend that all HDF files contain an MT tag for maximum portability. Possible machine types are shown in Table A.3. Table A.3 Possible Machine Types Type Possible Machines floats IEEE32, VAX32, CRAY64 ints VAX32, Intel16, Intel32, Motorola32, CRAY64 chars ASCII, EBCDIC double IEEE64, VAX64, CRAY128 Obviously, each of these is extensible as we find a need for new types. RLE Run length encoded data 0 bytes (11) This tag is used in the ID compression field and other places to indicate that an image or section of data is encoded with a run- length encoding scheme. The RLE method used is byte-wise. The low seven bits of the count byte indicate the number of bytes (n). The high bit of the count byte indicates whether the next byte should be replicated n times (high bit=1), or whether the next n bytes should be included as is (high bit=0). See also: RIG (Raster Image Group) IMC IMCOMP compressed data 0 bytes (12) This tag is used in the ID compression field and other places to indicate that an image or section of data is encoded with an IMCOMP encoding scheme. This scheme is a 4:1 aerial averaging method which is easy to decompress. It counts color frequencies in 4x4 squares to optimize color sampling. See also: RIG (Raster Image Group) Raster-8 (8-Bit Only) Tags ID8 Image dimension-8 4 bytes (200) The data for this tag consists of two 16-bit integers representing the width and height of an 8-bit raster image in bytes. IP8 Image palette-8 768 bytes (201) The data for this tag consists of 256 triples of three bytes. The bytes are for the red, green, and blue elements of the 256-byte palette respectively. The first triple is palette entry 0 and the last is palette entry 255. RI8 Raster image-8 mxn bytes (202) The data for this tag is a row-wise representation of the elementary 8-bit image data. The data is stored width-first (hence row-wise) and is 8-bits per pixel. The first byte of data represents the pixel in the upper-left hand corner of the image. CI8 Compressed image-8 ? bytes (203) The data for this tag is a row-wise representation of the elementary 8-bit image data. Each row is compressed using the following run- length encoding where n is the low seven bits of the byte. The high bit represents whether the following n character will be reproduced exactly (high bit=0) or whether the following character will be reproduced n times (high bit=1). Since CI8 and RI8 are basically interchangeable, it is suggested that you not have a CI8 and a RI8 that have the same reference number. II8 IMCOMP image-8 ? bytes (xxx) The data for this tag is a 4:1 compressed 8-bit image, using the IMCOMP compression scheme. General Raster Image Tags RIG Raster image group n*4 bytes (306) The raster image group (RIG) data is a list of data identifiers (tag/ref) that describe a raster image. All of the members of the group are required to display the image correctly. Application programs that deal with RIGs should read all the elements of a RIG and process those identifiers which it can display correctly. Even if the application cannot process all of the tags, the tags that it can process will be displayable. Tag types that may appear in a RIG are listed in Table A.4. Table A.4 Possible Tag Types in an RIG Tag Description ID image dimension RI raster image XYP X-Y position LD LUT dimension LUT color lookup table MD matte channel dimension MA matte channel CCN color correction CFM color format AR aspect ratio MTO machine-type override Example ID,RI,LD,LUT An image dimension record, the raster image, an LUT dimension and the LUT go together. The application reads the image dimensions, then reads the image with those dimensions. It also reads the lookup table according to its dimensions and displays the corresponding image. ID, LD, MD Image dimension 20 bytes (300) LUT dimension 20 bytes (307) Matte dimension 20 bytes (308) These three dimension records have exactly the same format. They define the dimensions of the 2D array to which each refers. ID specifies the dimensions of an RI tag, LD specifies the dimensions of an LUT tag, and MD specifies the dimensions of a MA tag. The fields are defined as shown in Table A.5. Table A.5 Dimension Record Fields Field Definition X-dimension (width) 32 bit integer Y-dimension (height) 32 bit integer NT/ref (element type) tag and ref=32 bits Elements per node 16 bit integer Interlace scheme 16 bit integer (0,1 or 2) Compression tag tag and ref=32 bits For example, a 512x256 row-wise 24-bit raster image with each pixel stored as RGB bytes would have the following values: X:512, Y:256 In this case, NT specifies an 8-bit integer. There are three elements per nodeŅone each for red, green, and blue. Interlace=0 indicates that the RGB values are not separated. Compression=0 means no compression scheme is used. RI Raster image x*y bytes (302) This tag points to raster image data. It must be stored as specified in an ID tag. Interlace=0 means each pixel is contiguous Interlace=1 means each element is grouped by scan lines Interlace=2 means each element is grouped by planes LUT Lookup table ? bytes (301) The LUT, sometimes called a palette, is used by many kinds of hardware to assign RGB colors or HSV colors to data values. When a raster image consists of data values which are going to be interpreted through hardware with a LUT capability, the LUT should be loaded along with the image. The most common lookup table will have X dimension=256 and Y dimension=1 with three elements per entry, one each for red, green, and blue. The interlace will be either 0, where the LUT values are given RGB, RGB, RGB . . . , or 1, where the LUT values are given as 256 reds, 256 greens, 256 blues. CCN Color correction 52 bytes (usually) (310) Color correction specifies the Gamma correction for the image and color primaries for the generation of the image. The fields, in order, are shown in Table A.6. Table A.6 Color Correction Field Types Correction Field Type Gamma float Red (X,Y,Z) three floats Green (X,Y,Z) three floats Blue (X,Y,Z) three floats White (X,Y,Z) three floats CFM Color format string (311) The color format is a clue to how each element of each pixel in a raster image. It is defined to be a string which is in all caps, and is one of the values shown in Table A.7. Table A.7 Color Format String Values String Description VALUE psuedo-color, or just a value associated with the pixel RGB red, green, blue model XYZ color-space model HSV hue, saturation, value model HSI hue, saturation, intensity SPECTRAL spectral sampling method AR Aspect ratio 4 bytes (312) The data for this tag is the visual aspect ratio for this image. The image should be visually correct if displayed on a screen with this aspect ratio. The data consists of one floating point number which represents width divided by height. An aspect ratio of 1.0 indicates a display with perfectly square pixels. 1.33 is a standard aspect ratio used by many monitors. Composite Image Tag DRAW Draw n*4 bytes (400) The data for this tag is a list of data identifiers (tag/ref) which define a composite image. Each member of the DRAW data should be displayed, in order, on the screen. This can be used to indicate several RIGs which should be displayed simultaneously, or even include vector overlays, like T14, which should be placed on top of a RIG. Some of the elements in a DRAW list will be instructions about how images are to be composited (XOR, source put, anti-aliasing, etc.). These are defined as individual tags. See also: RIG (Raster image display) T14 (Tektronix 4014 for overlay) Composite Raster Image Tag XYP XY position 8 bytes (500) An XY position is used in composites and other groups to indicate an XY position on the screen. For this, (0,0) is the lower left, X is the number of pixels to the right along the horizontal axis and Y is the number of pixels on the vertical axis. The X and Y pixel dimensions are given as two 32-bit integers. For example, if XYP is present inside an RIG, the XYP refers to the position of the lower left corner of the raster image on the screen. See also: DRAW (Drawing from a list of elements) Vector Image Tags T14 Tektronix 4014 n bytes (602) This tag points to a Tektronix 4014 data stream. The bytes in the data field, when read and sent to a Tektronix 4014 terminal, will display a vector image. Only the low seven bits of each byte are significant. There are no record markings or non-Tektronix codes in the data. T105 Tektronix 4105 n bytes (603) This tag points to a Tektronix 4105 data stream. The bytes in the data field, when read and sent to a Tektronix 4105 terminal, will be displayed as a vector image. Only the low seven bits of each byte are significant. Some terminal emulators will not correctly interpret every feature of the Tektronix 4105 terminal, so you may wish to use only a subset of the possible Tektronix 4105 vector commands. Scientific Data Set Tags SDG Scientific data group n*4 bytes (700) The scientific data group (SDG) data is a list of data identifiers (tag/ref) that describe a scientific data set. All of the members of the group provide information for correctly interpreting and displaying the data. Application programs that deal with SDGs should read all of the elements of a SDG and process those identifiers which it can use. Even if an application cannot process all of the tags, the tags that it can use will be displayable. Tag types that may appear in a SDG are listed in Table A.8. Figure A.8 Possible Tag Types in an SDG Tag Description SDD scientific data dimension record (rank and dimensions) SD scientific data SDS scales SDL labels SDU units SDF formats SDM maximum and minimum values SDC coordinate system SDT transposition Example SDD, SD, SDM A dimension record, the scientific data, and the maximum and minimum values of the data go together. The application reads the rank and dimensions from the dimension record, then reads the data array with those dimensions. If it needs maximum and minimum, it also reads them. SDD Scientific data dimension record 16 + 4*rank bytes (701) This record defines the rank and dimensions of the array in the scientific data set. The fields are defined as shown in Table A.9. Table A.9 Scientific Data Dimension Record Fields Field Definition Rank 16 bit integer Dimension sizes 32 bit integers Data NT (number type) 4 bytes Scale NTs rank*4 bytes For example, an SDD for a 500x600x3 array of floating point numbers would have the following values and components. Rank: 3 Dimensions: 500, 600, and 3. One data NT Three scale NTs SD Scientific data 4*x*y*z*... bytes (x, y, z, etc. are the dimensions) (702) This tag points to an array of scientific data. The type of the data may be specified by an NT tag included with the SDG. If there is no NT tag, the type of the data is floating point in standard IEEE 32-bit format. The rank and dimensions must be stored as specified in the corresponding SDD tag. SDS Scientific data scales n+4*(x+1+y+1+z+1+...) bytes (703) This tag points to the scales for the data set. The first n bytes indicate whether there is a scale for the corresponding dimension (1=yes, 0=no). This is followed by the scale values for each dimension. SDL Scientific data labels ? bytes (704) This tag points to a list of labels for the data and each dimension of the data set. Each label is a string terminated by a null byte. SDU Scientific data units ? bytes (705) This tag points to a list of strings specifying the units for the data and each dimension of the data set. Each unit's string is terminated by a null byte. SDF Scientific data format ? bytes (706) This tag points to a list of strings specifying an output format for the data and each dimension of the data set. Each format string is terminated by a null byte. SDM Scientific data max/min 8 bytes (707) This record contains the maximum and minimum data values in the data set. It consists of two floating point numbers. SDC Scientific data coordinates ? bytes (708) This tag points to a string specifying the coordinate system for the data set. The string is terminated by a null byte. SDT Scientific data transpose 0 bytes (709) The presence of this tag indicates that the data pointed to by the SD tag is in column-major order, instead of the default row-major order. No data is associated with this tag. A.1 NCSA HDF Specifications NCSA HDF Tags A.1 National Center for Supercomputing Applications March 1989 February 1989