File Structure

8 File Storage

File Structure

An understanding of files and records is essential when discussing file storage; therefore, this section describes files and records as well as different ways they can be accessed.

Files

Files are the basic unit into which programs and data are stored; however, they must be created and named before they can be stored. Different types of files can be created:

Program files (.PROG).
Data files (.DATA).
Forms files (.FORM).*
Database (DBMS) files.*

* Refer to the corresponding programming manual for instruction on using these files.

Records

Each file contains one or more logical records. These records are established using the CREATE statement. They can have any number of bytes from 1 to 999999. A logical record is the smallest unit of storage which is directly addressable.

A disk file cannot be greater than the maximum available storage space on the disk, or 999999 records, whichever is greater.

EOFs and EORs

Files and logical records are bounded on the storage medium by marks which signify their ends. There are two types of marks--end-of-file (EOF) and end-of-record (EOR).

An EOF is placed at the end of the data in a file by specifying END in a PRINT# statement. The EOF mark takes up two bytes of storage space unless the last data item goes exactly to the end of the file.

An EOR mark can signify the end of data within a logical record. See the PRINT# statement for details.

Data Access Methods

There are three ways to store and retrieve data--serial access, direct access, and direct word access. You determine which method of data access best suits your needs. Since the decision will be based on the amount of available disk storage and the time required for your operations, an understanding of data file structure is necessary for the most efficient use of the system.

For example, suppose you are working with thousands of customer account numbers and their balances due. Your job is to output a daily list of all customers and their balances. In this situation, it is best to pack all data items (customer numbers and balances due) together tightly in a data file to save space on the disk and to save time when accessing the data. This is serial access.

To update individual customer balances, you will need another file containing customer numbers, names, addresses, items purchased, and balances due. The data in this file is arranged so that each individual item (customer name or number) can be accessed. This method of storing data usually takes more space on the disk. The advantage here is that any item can be easily updated since individual items can be accessed much faster. This is direct access.

When you wish to update many individual portions of a file as fast as possible, direct word access can be used. Using this method allows better storage efficiency than direct access.

Serial Access

Data treated as a unit of information (instead of as individual items) can be handled using serial PRINT# and serial READ# statements. When serial PRINT# statements are used to store data on the disk, data items are stored compactly without identifiable marks between items. These data items make up a file and can contain as many records as necessary. Data lists can contain both numerics and strings.

All or part of the information stored originally can be retrieved in one serial READ# statement. The list of data elements read does not have to be identical to the list originally printed in the file, but these data lists must be identical in size, type (numeric or string), and order. (The names and numeric precision you assign to these elements can still vary.) The beginning of a serial file is the only point where data access is possible.

Direct Access

When data items are to be handled individually (instead of as a unit), direct PRINT# and direct READ# operations can be used. The same PRINT# and READ# statements are used with an additional parameter to specify a record number. Each data item is stored in one (or more, if required) records so that each data item is directly accessible. Storing data directly may not utilize storage space effectively, since only a part of a record (or records) required for storage may be used.

Each of the data items stored originally can be retrieved by using a direct READ#. The READ# begins at the start of a specified record. The list of data items does not have to be identical to the list originally printed in the record, but the data items must be identical in size, type (numeric or string), and order. Notice that since the numeric precision need not be the same from PRINT# to READ#, numeric conversion is easily performed.

Direct Word Access

When you wish to handle individual data items and also wish to specify the exact point within a record where the data is to be printed or read, use direct word access. This access method is specified by adding another parameter, called a word pointer, to the READ# and PRINT# statements.

Direct word access offers the best accessibility to data, since you specify the exact word at which the read or print begins. Use of disk storage space is good, too, since end-of-record (EOR) marks are not added after the data; therefore, remaining space in the record can be used for more data storage.

Comparing Data Access Methods

As mentioned before, you decide on which method of data accessing is best for your particular needs. This decision is usually not made easily, because of the advantages and disadvantages of each method. For example, more efficient storage space utilization must be sacrificed for a shorter access time and vice versa. Once your decision has been made, it is difficult to change later, so make your decision carefully.

The advantages and disadvantages of accessing data with each method are summarized below:

Comparison of Data Access Methods
Access Time Storage Efficiency
Serial Varies - longer for higher-numbered records Best - data is packed solidly
Direct Good - direct access to any record Varies - only part of a record may be used
Direct Word Best - only part of a record need be accessed Good

Comparison of Data Access Methods
	Access Time	Storage Efficiency
Serial	Varies - longer for higher-numbered records	Best - data is packed solidly
Direct	Good - direct access to any record	Varies - only part of a record may be used
Direct Word	Best - only part of a record need be accessed	Good

Eloquence Language Manual - 19 DEC 2002