Chapter 4	HDF Conventions


Chapter Overview
Byte Order
Naming and Assigning Tags
Using Reference Numbers to Organize Data 
Objects
Multiple References and File Compaction

Chapter Overview

The specification of HDF described in the previous chapter is not 
sufficient to guarantee its success. With a general structure such 
as HDF to work with, you may be tempted to create overly clever, 
obscure, and complex files, defeating the purpose behind the design 
of HDFÑto facilitate the interchange of data among differing 
applications, programs, and computer systems. To guard against 
these problems, a number of guidelines on the use of HDF are 
provided:  some are implicit in the discussions in other sections of 
this document and others are presented in the NCSA HDF manual. 
Additional guidelines not covered elsewhere are introduced in this 
section.


Byte Order

The byte-ordering of data stored in memory and on files varies 
from machine to machine.  It is important to address this 
possibility in any software that processes data from files that might 
have visited more than one machine.

In most cases, the existence of "byte-swapped" data can be detected 
by examining the  Machine Type (MT) tag, which is described in 
Appendix A.  Unfortunately, a program cannot look at the MT 
until it has opened an HDF file and looked at the header, so it is 
important that the byte order of the header be the same, no matter 
what machine is used to create an HDF file.  Therefore, to 
maintain machine portability when developing software for 
machines that swap bytes, you should make sure that the characters 
are read and written in the exact order specified (i.e., ^N^C^S^A).


Naming and Assigning Tags

Tags that are to be made available to a general population of HDF 
users should be assigned and controlled by NCSA. Tags of this type 
are given numbers in the range 8,000 - 15,999. If you have an 
application that fits this criterion, contact NCSA at the address 
listed on the README page at the beginning of this manual and 
specify the tags you would like. For each tag, your specifications 
should include a suggested name, information about the type and 
structure of the data that the tag will refer to, and information about 
how the tag will be used. Your specifications should be similar to 
those contained in Appendix A. NCSA will assign you a set of tags 
for your application and include your tag descriptions in its 
documentation.

Tags in the range 16,000 - 32,000 are user-definable. That is, you 
can assign them for any private application. Of course, if you use 
tags in this range you need to be aware that they may conflict with 
other people's private tags.

Using Reference Numbers to Organize Data Objects

Reference numbers distinguish between one data object and others 
with the same tag. It is often tempting to assign meaning to 
reference numbers beyond this distinction. Such uses of reference 
numbers are generally discouraged because most HDF software 
assumes that reference numbers can be chosen or assigned in any 
order, as long as all tag/ref combinations in a file remain unique.

On the other hand, in some situations reference number can be 
used easily and effectively to add meaning to a fileÑfor instance, 
to provide informal grouping and ordering of data objects. For 
example, suppose you want to create a movie out of the images in an 
HDF file. The images need to be ordered in some way and each 
image needs to be associated with other data objects. In such cases, 
you should refer to the following rules to assign reference 
numbers, especially among sets of tags that are required to go 
together.

1.	Assign reference numbers in increasing order.
2.	When a companion object is required (for example, image 
dimension is needed for images, use the object with the 
current reference number. If none exists, use the object with 
the next lowest reference number.
3.	When there are several combinations, cycle through 
reference numbers for each tag type in ascending order in all 
combinations that do not break the order.

For example, suppose the following data identifiers are in a file:  
IP8[1], ID8[1], RI8[1], RI8[2], IP8[2], IP8[3], IP8[4], RI8[4], RI8[10], 
DIL[1], DIL[10]. Objects in the file should be grouped in the order 
shown in Table 4.1.


Table 4.1	Sample Grouping of 
Data Objects in an 
HDF File
IP8	ID8	RI8	DIL
1	1	1	1		palette[1], dimensions[1], image[1], label[1]
2	1	2	1		change to palette[2] and image[2]
3	1	2	1		change to palette[3]
4	1	4	1		change to palette[4] and image[4]
4	1	10	10		change to image[10] and label[10]


Of course, these conventions only work if all of the programs that 
touch a file do so without altering the reference numbering 
scheme. If this cannot be guaranteed, you have a second 
solutionÑto impose order on the file by means of a sorted, keyed 
index. This is a good solution because it can provide random 
access as well as ordered access to data; however, it adds a level of 
complexity to what might otherwise be a simple sequential file 
organization.

Multiple References and File Compaction

Multiple references to a single data element are quite common.  
The general purpose routine DFdup (see Chapter 2) generates a new 
reference to data that is already pointed to by another DD.  If DFdup 
is used several times, there could be several DDs that point to the 
same data element.  It is important to note that when such a 
multiply referenced data element is moved, the various DDs that 
pointed to the data element before it was moved are not 
automatically adjusted to point to the data element in its new 
location.

For example, when DFaccess is called for write access, the 
referenced data is copied to the end of the HDF file to allow the user 
to append to that element. The original data is not moved, only 
copied. If there are multiple references to that data, then the old 
references still point to the old data.

Consequently, compaction of a file should be done in the following 
manner. Proceed through the data descriptors in order. For each 
descriptor, determine whether the data has already been copied; if 
not, copy the data into a new file and update its descriptor. If the 
data has already been copied, as in the case of a multiple reference, 
then the descriptor should be placed in the new file as a multiple 
reference. Any data that no longer has a DD reference should not 
be copied to the new file.

After updating elements, you may need to call DFdescriptors 
again to make sure that your copy of the descriptors is up-to-date.

Of course, for most programs you will be writing, you only need to 
create a completely new file, add elements to an existing file, or 
read a pre-existing file. For all of these operations, consistency is 
completely handled by the lower level HDF interface, even in the 
case of multiple references to the same data.

4.1	NCSA HDF Specifications

HDF Conventions	4.1

National Center for Supercomputing Applications

March 1989

                                                                
4.1	NCSA HDF Specifications

HDF Conventions	4.1

National Center for Supercomputing Applications

March 1989