THE B-PLUS PROGRAM A B-TREE INDEXING FILE MODULE FOR C PROGRAMMERS by Hunter and Associates B-PLUS is a versatile, carefully designed module for C programmers who need a fast, efficient program for indexing data files. B-PLUS allows data records to be retrieved based on a key value without regard to their position in the data file. The data records can also be accessed in sequential order in either a forward and reverse direction. The B-PLUS Program Module is based on the famous and widely used b-tree algorithm and has a number of useful extensions which are not found in many programs of this type. Some of its features are the following: - Variable length keys are allowed - File size limited only by DOS or by disk space - All functions are non-recursive so very little stack space is required - The most recently used key values are stored in a cache buffer in main memory for fast access - Duplicate keys are allowed Version 1.1A of the B-PLUS program has been tested for Microsoft C Compilers, Versions 4.0, 5.0, 5.1 and the Borland Turbo C Compiler Version 1.5. The compiled object file is less than 10K bytes in length for these compilers. See the instructions at the end of this user's guide for a special note regarding Microsoft's older C Version 4.0. Version 1.1A has several new features that were not in Version 1.0. The next_key and prev_key routines can now be called immediately after adding or deleting an index key. It is no longer necessary to "reset" the index file with a find_key or locate_key function call after adding or deleting keys. All known bugs have been corrected in Version 1.1A. LICENSE AND REGISTRATION B-PLUS is distributed as a "share ware" program. Please help us get it known by giving unmodified copies of the HUNTER AND ASSOCIATES B-PLUS FILE INDEXING PROGRAM ------------------------------------------------------------- program and documentation to other programmers who may find B-PLUS useful. B-PLUS is copyright (C) 1987 by Hunter and Associates. It is not public domain or free software. Non-registered users are granted a limited license to use B-PLUS on a trial basis for determining whether or not it is suitable for their needs. Registration permits the use of B-PLUS on one CPU and allows the use of the compiled B-PLUS modules in programs for general sale and/or distribution. The registration fee is $25 or $35. Users who pay the $35 fee will be sent a disk containing a fully commented listing of the latest source code, the user documentation, and a number of useful sample programs. Users who pay the $25 fee are not sent a new disk but are included in the mailing list for announcements about both current and future products. Your prompt registration of your copy of the B- PLUS program is appreciated. A trial disk with supporting documentation is available directly from Hunter and Associates for $10. Register your usage of B-PLUS by sending the registra- tion fee to: Hunter and Associates 7900 Edgewater Drive Wilsonville, OR 97070 Telephone: (503) 694-1449 Your comments regarding the B-PLUS program or any suggestions you have for extensions or for other programs that would be useful to you are welcomed. Hunter and Associates makes no warranties whatsoever regarding the B-PLUS computer programs or the supporting documentation. USING B-PLUS IN YOUR PROGRAMS The B-PLUS File Indexing Module contains twelve functions that handle the retrieval of data records by key value. The keys that are used to locate records are null terminated strings. The data structures and constants that are used are defined in the header file bplus.h. If the data record field that you want to use as a key contains numeric data, you can use one of the library Page 2 HUNTER AND ASSOCIATES B-PLUS FILE INDEXING PROGRAM ------------------------------------------------------------- conversion routines (fcvt, evct, sprintf) to convert the data to string format. The connection between a key and its reference is formalized as a structure of type ENTRY. This structure contains three elements: typedef struct { RECPOS idxptr; /* long pointer to next index level */ RECPOS recptr; /* long pointer to the file position of data record */ char key[MAXKEY]; /* with this key value */ } ENTRY; The application program uses only the recptr and key[] fields. The idxptr is used and maintained by the B-PLUS modules. A variable of type IX_DESC is declared for each open index file. See the header file bplus.h if you are interested in the elements of this structure. ENTRY and IX_DESC are the only two data types that are normally used by application programs. Here is a sample program stub which calls the open_index and find_index subroutines. Example: #include "bplus.h" main() { ENTRY e; IX_DESC names; /* open index file called NAMES.IDX */ open_index("NAMES.IDX", &names, 0); /* find an index record for John Smith */ strcpy(e.key, "SMITH JOHN"); if(find_key(&e, &names)) printf("Data record address is %ld", e.recptr); else printf("Cannot find record for that key"); } Page 3 HUNTER AND ASSOCIATES B-PLUS FILE INDEXING PROGRAM ------------------------------------------------------------- Each of the twelve subroutines is now described. int cdecl open_index(name, pix, dup); char *name; File path name IX_DESC *pix; Pointer to index file variable int dup; 0 - no duplicate keys allowed 1 - allow duplicate keys Description: The open_index function is used to open and initialize an existing index file specified by name and prepares the file for subsequent reading or writing. The file structure variable pix is defined in the application program. Duplicate keys are allowed or not allowed depending on whether dup has the value of 0 or 1. The open_index function returns the value IX_OK (1) if the file is opened successfully. If the file cannot be opened, an error message is displayed and the program is aborted. int cdecl make_index(name, pix, dup); char *name; File path name IX_DESC *pix; Pointer to index file variable int dup; 0 - no duplicate keys allowed 1 - allow duplicate keys Description: The make_index function is used to create and initialize a new index file specified by name and to prepare the file for subsequent reading or writing. If a file of this name already exists, its contents are destroyed when the new file is created. The file structure variable pix is defined in the application program. Duplicate keys are allowed or not allowed depending on whether dup has the value of 0 or 1. The make_index function returns the value IX_OK (1) if the file is created successfully. If the file cannot be created, an error message is displayed and the program is aborted. int cdecl close_index(pix); IX_DESC *pix; Pointer to index file variable Description: The close_index file function clears the internal cache buffer and closes the specified index Page 4 HUNTER AND ASSOCIATES B-PLUS FILE INDEXING PROGRAM ------------------------------------------------------------- file. It is very important that each index file be closed. Otherwise data that is stored in the internal cache buffer may be lost and the index file may not be properly updated. The close_index function returns the value IX_OK (1) if the file is successfully closed. int cdecl find_key(pe, pix); ENTRY *pe; Pointer to variable of type ENTRY IX_DESC *pix; Pointer to index file variable Description: The find_key function searches the index file for the key value contained in pe.key. If an exact match is found, the value IX_OK (1) is returned and the location of the data record with this key value is stored in pe.recptr. If an exact match is not found, the value IX_FAIL (0) is returned and pe.recptr is undefined. If the index file contains duplicate keys, the first key is always found. int cdecl locate_key(pe, pix); ENTRY *pe; Pointer to variable of type ENTRY IX_DESC *pix; Pointer to index file variable Description: The locate key function searches the index file for the first key value which is equal to or greater than that stored in pe.key. The location of the data record which is equal to or greater than pe.key is stored in pe.recptr. This function can be used to locate an entry in the index file when only part of the key value is known. If the index file contains duplicate keys, locate_key will locate the first key. The following values are returned by locate_key: IX_OK - the value (1) is returned if an exact match is found IX_FAIL - the value (0) is returned if an exact match is not found EOIX - the value (-2) is returned for end of index if the search key is greater than all keys in the index file and pe.recptr is undefined. Page 5 HUNTER AND ASSOCIATES B-PLUS FILE INDEXING PROGRAM ------------------------------------------------------------- int cdecl add_key(pe, pix); ENTRY *pe; Pointer to variable of type ENTRY IX_DESC *pix; Pointer to index file variable Description: The add_key function adds new entries to the index file. The calling program stores the key value in pe.key and the data record address in pe.recptr. Add_key first looks to see if an entry with this key already exists in the index file. If no entry with this key exists, the new entry is added. If an entry with this key already exists, the new entry is added only if duplicate keys are allowed (as defined by the open_index function). If the entry is successfully added, IX_OK (1) is returned; otherwise IX_FAIL (0) is returned. int cdecl delete_key(pe, pix); ENTRY *pe; Pointer to variable of type ENTRY IX_DESC *pix; Pointer to index file variable Description: The delete_key function deletes entries in the index file. The key to be deleted is stored in pe.key. If duplicate records are allowed, the corresponding data record address must also be stored in pe.recptr. In this case, delete key needs the record number to distinguish entries. If there are not duplicate entries, this field is ignored. If the entry is successfully deleted, IX_OK (1) is returned; otherwise IX_FAIL (0) is returned. The space that was occupied by the entry is marked as free for reused by B_PLUS. int cdecl first_key(pix); IX_DESC *pix; Pointer to index file variable Description: The first_key function positions the index file pointer to the beginning of the index file. The function next_key can then be used to list the file in key order. The first_key function returns the value IX_OK (1). Page 6 HUNTER AND ASSOCIATES B-PLUS FILE INDEXING PROGRAM ------------------------------------------------------------- int cdecl last_key(pix); IX_DESC *pix; Pointer to index file variable Description: The last_key function positions the index file pointer at the end of the index file. The function previous_key can then be used to list the file in reverse key order. The last_key function returns the value IX_OK (1). int cdecl next_key(pe, pix); ENTRY *pe; Pointer to variable of type ENTRY IX_DESC *pix; Pointer to index file variable Description: The next_key function returns the next entry in the index file. After deleting or adding keys, next_key returns the key immediately following the addition or deletion. Next_key can be used to locate the desired data record when duplicate keys are allowed. Next_key is used to process files sequential. Next_key returns the value IX_OK (1) if the next key is present and the value EOIX (-2) when the file pointer is at the end of the index file. The following program processes an indexed file sequentially: #include "bplus.h" main() { ENTRY e; IX_DESC names; /* open the index file */ open_index("names.idx", &names); /* now process the file sequentially */ first_key(&names); ret = next_key(&e, &names); while (ret == IX_OK) { /* the data record location is e.recptr */ /* the program would retrieve and process it */ ret = next_key(&e, &names); } /* remember to always close open files */ close_index(&names); } Page 7 HUNTER AND ASSOCIATES B-PLUS FILE INDEXING PROGRAM ------------------------------------------------------------- int cdecl prev_key(pe, pix); ENTRY *pe; Pointer to variable of type ENTRY IX_DESC *pix; Pointer to index file variable Description: The prev_key function returns the previous entry in the index file. After deleting or adding keys, prev_key returns the key immediately preceeding the addition or deletion. Prev_key can be used to process files sequentially in reverse order. Prev_key returns the value IX_OK (1) if there is a previous key and the value EOIX (-2) when the file pointer is at the beginning of the index file. int cdecl find_exact(pe, pix); ENTRY *pe; Pointer to variable of type ENTRY IX_DESC *pix; Pointer to index file variable Description: The find_exact function searches the index file for the key value contained in pe.key and the data record position stored in pe.recptr. If an exact match is found for both of these values, the value IX_OK (1) is returned, and the internal index file pointer is set to that position in the index file. If an exact match is not found, the value IX_FAIL (0) is returned. TAILORING OR CHANGING B-PLUS Most applications can use the B-PLUS program as it is distributed by Hunter and Associates without any changes. It allows multiple, large data files to be indexed in a fast, efficient manner. However, a description of the values that can be changed to tailor B-PLUS are given in this section. An index tree becomes full when no more entries can be added to the tree. This is the case when adding another entry to the tree would cause the tree to grow larger than its maximum allowed height. This height depends on the size of the index blocks and the average size of the keys used by the data files. The minimum capacity of a b-tree index is given by the following formula: C = (B / E + 1) * (B / (2 * E) + 1) ** (H - 1) where C is the number of entries in the index file, B is the Page 8 HUNTER AND ASSOCIATES B-PLUS FILE INDEXING PROGRAM ------------------------------------------------------------- block size in bytes, E is the average size of an ENTRY in bytes, and H is the maximum tree height. The maximum key size is defined by MAXKEY and is set at 100. The block size is 1024 bytes as defined by IXB_SIZE. Ten bytes are used by pointers so 1014 bytes are used by entries. The size of an ENTRY is 9 + the key length. Thus, if the average key length is 11, an average ENTRY is 20 bytes long and the minimum number of entries that can be contained in a tree of height 4 is: C = (1014 / 20 + 1) * (1014 / 40 + 1) ** 3 = 945,072 If the average ENTRY is 40 bytes long, the minimum number of entries that can be contained in a tree of height 4 is 67,384. The corresponding values for a tree of height 3 are 35,896 and 4927, respectively. The maximum tree height is determined by MAX_LEVELS and is set to eight. Very little memory space is used by allowing the maximum tree height to be this large. B-PLUS increases the height of the tree dynamically as required by the number of records in the data file. If your application does not use long keys and your files are not huge, IXB_SIZE can be changed to 512 bytes with only a slight degradation in performance. The root of an open index file is always memory resident as defined by the variable of type IX_DESC. A dynamic pool of cache buffers is used for other index blocks of the tree. The number of blocks in the pool is defined by NUM_BUFS with a default value of 8. Each memory block is sizeof(IXB_SIZE) + 8 bytes in length so approximately 8K of memory is used for cache storage of index blocks. If the number of index files that are open simultaneously is larger than 4, you may want to increase NUM_BUFS. If the usage of memory is critical, the number of buffers can be decreased. However, NUM_BUFS must be at least 2. In general, the speed of file access can be expected to improve if the number of buffers is increased since more of the index file is memory resident. Because some indices are always memory resident, and because the DOS Operating System requires it, it is very important that all open index files be closed before an application program terminates. As stated earlier, the B-PLUS program has been tested Page 9 HUNTER AND ASSOCIATES B-PLUS FILE INDEXING PROGRAM ------------------------------------------------------------- using Microsoft's Optimizing C Compilers, Versions 4, 4.5 and 5.0, and Borland's Turbo C, Version 1.0. However, standard K & R programming guidelines are followed so B-PLUS should be able to be used with other C Compilers with little modification. Since B-PLUS is non-recursive, the usage of stack space does not change dynamically. It is recommend that the B-PLUS program be complied without stack checking. For Microsoft C, the /Ox option can be used to maximize speed and minimize code size. For Turbo C, B-PLUS can be complied with maximum optimization to minimize the object size and improve performance. ACKNOWLEDGMENTS AND REFERENCES The following reference materials were used and helpful in writing the B-PLUS program: Wirth, Niklaus: Algorithms + Data Structures = Programs Prentice Hall (1976) Hunt, William James: The C Toolbox (Serious C Programming for the IBM PC) Addison-Wesley (1985) Wirth's book is a standard reference source for binary trees and data structures. Hunt's C Toolbox contains useful C programming concepts with carefully constructed programming examples. USING THE BPLUS ROUTINES The BPLUS.C routines must be compiled and the object file (BPLUS.OBJ) loaded with programs that use the B_PLUS toolkit. Several sample C programs have been included which illustrate how to use the BPLUS Toolkit. These programs should be compiled and run to make sure your copy of the program is correct. A SPECIAL NOTE REGARDING MICROSOFT C 4.0 COMPILER The Microsoft C library routines are different for Version 4.0 and for Version 5.0 and Quick C. In particular, the memmove routine must be changed to the memcpy routine in Version 4.0. A macro is included in BPLUS.C which makes this substitution. Remove the comments delimiters for this version. Page 10