When an HDF file is created, the file's DD block size is specified. The default size is 16 DDs per DD block. When you start putting objects into an HDF file, their DDs are inserted into the first DD block. When the DD block gets filled up, a new DD block is created, stored at some other location in the file, and linked with the previous DD block. If a large number of objects are stored in an HDF file whose DD block size is small, a large number of DD blocks will be needed, and each DD block is likely to be stored on a different disk page.
Consider, for example, an HDF file with 1,000 SDSs and a DD block size of 16. Each SDS could easily require 10 DDs to describe all the objects comprising the SDS, so the entire file might contain 10,000 DDs. This would require 625 (10,000/16) DD blocks, each stored on a different disk page.
Whenever an HDF file is opened, all of the DDs are read into memory. Hence, in our example, 625 disk accesses might be required just to open the file.
Fortunately, there is a way we can use this kind of information to improve performance. When we create an HDF file, we can specify the DD block size. If we know that the file will have many objects stored in it, we should choose a large DD block size so that each disk access will read in a large number of DDs, and hence there will be fewer disk accesses. In our example, we might have chosen the DD block size to be 10,000, resulting in only one disk access. (Of course, this example goes deliberately to a logical extreme. For a variety of reasons, a more common approach would be to set the DD block size to something between 1,000 and 5,000 DDs.)
From this discussion we can derive the following rules of thumb for achieving good performance by altering the DD block size.
You can change the linked block size for SDSs by use of the function SDsetblocksize. To change the linked block size for Vdatas, you must edit the hlimits.h file, change the value of HDF_APPENDABLE_BLOCK_LEN, and re-build the HDF library. Changing the linked block size only affects the size of the linked blocks used after the change is made; it does not affect the size of blocks that have already been written.
There is a certain amount of overhead when creating linked blocks. For every linked block that is added there will be a specified number of block accesses, disk space used, and reference numbers added to the file. If you increase the size of the linked block, it will decrease the number of block accesses, disk space used, and reference numbers added to the file. Making the linked block size larger will decrease the number of reference numbers required; this is sometimes necessary because there are a limited number of available reference numbers.
Linked block size can also affect I/O performance, depending on how the data is accessed. If the data will typically be accessed in large chunks, then making the linked block size large could improve performance. If the data is accessed in small chunks, then making the linked block size small could improve performance.
If data will be randomly accessed in small amounts, then it is better to have small linked blocks.
Ideally one might say that making the linked block size equal to the size of the dataset that will typically be accessed, is the best solution. However, there are other things that will affect performance, such as the operating system being used, the sector size on the disk being accessed, the amount of memory available, and access patterns.
Here are some rules of thumb for specifying linked block size: