Newer
Older
Microsoft / bplus / BPLUS.DOC
@tundra tundra on 24 May 2012 24 KB Initial revision






                               THE B-PLUS PROGRAM
                         A B-TREE INDEXING FILE MODULE
                               FOR C PROGRAMMERS
                                       by
                             Hunter and Associates



              B-PLUS is a versatile, carefully designed module for C
         programmers who need a fast, efficient program for indexing
         data files.  B-PLUS allows data records to be retrieved based
         on a key value without regard to their position in the data
         file.  The data records can also be accessed in sequential
         order in either a forward and reverse direction.

              The B-PLUS Program Module is based on the famous and
         widely used b-tree algorithm and has a number of useful
         extensions which are not found in many programs of this type.
         Some of its features are the following:

              - Variable length keys are allowed

              - File size limited only by DOS or by disk space

              - All functions are non-recursive so very little stack
                space is required

              - The most recently used key values are stored in a
                cache buffer in main memory for fast access

              - Duplicate keys are allowed

              Version 1.1A of the B-PLUS program has been tested
         for Microsoft C Compilers, Versions 4.0, 5.0, 5.1 and the
         Borland Turbo C Compiler Version 1.5.  The compiled object
         file is less than 10K bytes in length for these compilers.
         See the instructions at the end of this user's guide for a
         special note regarding Microsoft's older C Version 4.0.

              Version 1.1A has several new features that were not in
         Version 1.0.  The next_key and prev_key routines can now be
         called immediately after adding or deleting an index key.  It
         is no longer necessary to "reset" the index file with a
         find_key or locate_key function call after adding or deleting
         keys.  All known bugs have been corrected in Version 1.1A.



         LICENSE AND REGISTRATION

              B-PLUS is distributed as a "share ware" program.  Please
         help us get it known by giving unmodified copies of the


         HUNTER AND ASSOCIATES            B-PLUS FILE INDEXING PROGRAM
         -------------------------------------------------------------


         program and documentation to other programmers who may find
         B-PLUS useful.

              B-PLUS is copyright (C) 1987 by Hunter and Associates.
         It is not public domain or free software.  Non-registered
         users are granted a limited license to use B-PLUS on a trial
         basis for determining whether or not it is suitable for their
         needs.  Registration permits the use of B-PLUS on one CPU and
         allows the use of the compiled B-PLUS modules in programs for
         general sale and/or distribution.

              The registration fee is $25 or $35.  Users who pay the
         $35 fee will be sent a disk containing a fully commented
         listing of the latest source code, the user documentation,
         and a number of useful sample programs.  Users who pay the
         $25 fee are not sent a new disk but are included in the
         mailing list for announcements about both current and future
         products.  Your prompt registration of your copy of the B-
         PLUS program is appreciated.

              A trial disk with supporting documentation is available
         directly from Hunter and Associates for $10.

              Register your usage of B-PLUS by sending the registra-
         tion fee to:

                        Hunter and Associates
                        7900 Edgewater Drive
                        Wilsonville, OR  97070
                        Telephone: (503) 694-1449

         Your comments regarding the B-PLUS program or any suggestions
         you have for extensions or for other programs that would be
         useful to you are welcomed.

              Hunter and Associates makes no warranties whatsoever
         regarding the B-PLUS computer programs or the supporting
         documentation.


         USING B-PLUS IN YOUR PROGRAMS

              The B-PLUS File Indexing Module contains twelve
         functions that handle the retrieval of data records by key
         value.  The keys that are used to locate records are null
         terminated strings.  The data structures and constants that
         are used are defined in the header file bplus.h.

              If the data record field that you want to use as a key
         contains numeric data, you can use one of the library

                                   Page 2


         HUNTER AND ASSOCIATES            B-PLUS FILE INDEXING PROGRAM
         -------------------------------------------------------------


         conversion routines (fcvt, evct, sprintf) to convert the data
         to string format.

              The connection between a key and its reference is
         formalized as a structure of type ENTRY.  This structure
         contains three elements:

         typedef struct
           {
             RECPOS   idxptr;         /* long pointer to next index
                                         level                      */
             RECPOS   recptr;         /* long pointer to the file
                                         position of data record    */
             char     key[MAXKEY];    /* with this key value        */
           } ENTRY;

              The application program uses only the recptr and key[]
         fields.  The idxptr is used and maintained by the B-PLUS
         modules.

              A variable of type IX_DESC is declared for each open
         index file.  See the header file bplus.h if you are
         interested in the elements of this structure.  ENTRY and
         IX_DESC are the only two data types that are normally used by
         application programs.

              Here is a sample program stub which calls the open_index
         and find_index subroutines.


         Example:

           #include "bplus.h"
           main()
             {
                ENTRY    e;
                IX_DESC  names;

                /* open index file called NAMES.IDX */

                open_index("NAMES.IDX", &names, 0);

                /* find an index record for John Smith */

                strcpy(e.key, "SMITH JOHN");
                if(find_key(&e, &names))
                  printf("Data record address is %ld", e.recptr);
                else
                  printf("Cannot find record for that key");
              }

                                   Page 3


         HUNTER AND ASSOCIATES            B-PLUS FILE INDEXING PROGRAM
         -------------------------------------------------------------


         Each of the twelve subroutines is now described.

         int cdecl open_index(name, pix, dup);

              char *name;         File path name
              IX_DESC *pix;       Pointer to index file variable
              int dup;            0 - no duplicate keys allowed
                                  1 - allow duplicate keys

              Description:  The open_index function is used to open
              and initialize an existing index file specified by name
              and prepares the file for subsequent reading or writing.
              The file structure variable pix is defined in the
              application program.  Duplicate keys are allowed or not
              allowed depending on whether dup has the value of 0 or
              1.  The open_index function returns the value IX_OK (1)
              if the file is opened successfully.  If the file cannot
              be opened, an error message is displayed and the program
              is aborted.



         int cdecl make_index(name, pix, dup);

              char *name;         File path name
              IX_DESC *pix;       Pointer to index file variable
              int dup;            0 - no duplicate keys allowed
                                  1 - allow duplicate keys

              Description:  The make_index function is used to create
              and initialize a new index file specified by name and to
              prepare the file for subsequent reading or writing.  If
              a file of this name already exists, its contents are
              destroyed when the new file is created.  The file
              structure variable pix is defined in the application
              program.  Duplicate keys are allowed or not allowed
              depending on whether dup has the value of 0 or 1.  The
              make_index function returns the value IX_OK (1) if the
              file is created successfully.  If the file cannot be
              created, an error message is displayed and the program
              is aborted.



         int cdecl close_index(pix);

              IX_DESC *pix;       Pointer to index file variable

              Description:  The close_index file function clears the
              internal cache buffer and closes the specified index

                                   Page 4


         HUNTER AND ASSOCIATES            B-PLUS FILE INDEXING PROGRAM
         -------------------------------------------------------------


              file.  It is very important that each index file be
              closed.  Otherwise data that is stored in the internal

              cache buffer may be lost and the index file may not be
              properly updated.  The close_index function returns the
              value IX_OK (1) if the file is successfully closed.



         int cdecl find_key(pe, pix);

              ENTRY *pe;          Pointer to variable of type ENTRY
              IX_DESC *pix;       Pointer to index file variable

              Description:  The find_key function searches the index
              file for the key value contained in pe.key.  If an exact
              match is found, the value IX_OK (1) is returned and the
              location of the data record with this key value is
              stored in pe.recptr.  If an exact match is not found,
              the value IX_FAIL (0) is returned and pe.recptr is
              undefined.  If the index file contains duplicate keys,
              the first key is always found.



         int cdecl locate_key(pe, pix);

              ENTRY *pe;          Pointer to variable of type ENTRY
              IX_DESC *pix;       Pointer to index file variable

              Description:  The locate key function searches the index
              file for the first key value which is equal to or
              greater than that stored in pe.key.  The location of the
              data record which is equal to or greater than pe.key is
              stored in pe.recptr.  This function can be used to
              locate an entry in the index file when only part of the
              key value is known.  If the index file contains
              duplicate keys, locate_key will locate the first key.
              The following values are returned by locate_key:

                   IX_OK  -  the value (1) is returned if an exact
                             match is found

                   IX_FAIL - the value (0) is returned if an exact
                             match is not found

                   EOIX  -   the value (-2) is returned for end of
                             index if the search key is greater than
                             all keys in the index file and pe.recptr
                             is undefined.

                                   Page 5


         HUNTER AND ASSOCIATES            B-PLUS FILE INDEXING PROGRAM
         -------------------------------------------------------------




         int cdecl add_key(pe, pix);

              ENTRY *pe;          Pointer to variable of type ENTRY
              IX_DESC *pix;       Pointer to index file variable

              Description:  The add_key function adds new entries to
              the index file.  The calling program stores the key
              value in pe.key and the data record address in
              pe.recptr.  Add_key first looks to see if an entry with
              this key already exists in the index file.  If no entry
              with this key exists, the new entry is added.  If an
              entry with this key already exists, the new entry is
              added only if duplicate keys are allowed (as defined by
              the open_index function).  If the entry is successfully
              added, IX_OK (1) is returned; otherwise IX_FAIL (0) is
              returned.



         int cdecl delete_key(pe, pix);

              ENTRY *pe;          Pointer to variable of type ENTRY
              IX_DESC *pix;       Pointer to index file variable

              Description:  The delete_key function deletes entries
              in the index file.  The key to be deleted is stored in
              pe.key.  If duplicate records are allowed, the
              corresponding data record address must also be stored in
              pe.recptr.  In this case, delete key needs the record
              number to distinguish entries.  If there are not
              duplicate entries, this field is ignored.  If the entry
              is successfully deleted, IX_OK (1) is returned;
              otherwise IX_FAIL (0) is returned.  The space that was
              occupied by the entry is marked as free for reused by
              B_PLUS.



         int cdecl first_key(pix);

              IX_DESC *pix;       Pointer to index file variable

              Description:  The first_key function positions the index
              file pointer to the beginning of the index file.  The
              function next_key can then be used to list the file in
              key order.  The first_key function returns the value
              IX_OK (1).


                                   Page 6


         HUNTER AND ASSOCIATES            B-PLUS FILE INDEXING PROGRAM
         -------------------------------------------------------------


         int cdecl last_key(pix);

              IX_DESC *pix;       Pointer to index file variable

              Description:  The last_key function positions the index
              file pointer at the end of the index file.  The function
              previous_key can then be used to list the file in
              reverse key order.  The last_key function returns the
              value IX_OK (1).



         int cdecl next_key(pe, pix);

              ENTRY *pe;          Pointer to variable of type ENTRY
              IX_DESC *pix;       Pointer to index file variable

              Description:  The next_key function returns the next
              entry in the index file.  After deleting or adding keys,
              next_key returns the key immediately following the
              addition or deletion.  Next_key can be used to locate
              the desired data record when duplicate keys are allowed.
              Next_key is used to process files sequential.  Next_key
              returns the value IX_OK (1) if the next key is present
              and the value EOIX (-2) when the file pointer is at the
              end of the index file.  The following program processes
              an indexed file sequentially:

              #include "bplus.h"
              main()
                {
                  ENTRY e;
                  IX_DESC names;

                  /* open the index file */
                  open_index("names.idx", &names);

                  /* now process the file sequentially */
                  first_key(&names);
                  ret = next_key(&e, &names);
                  while (ret == IX_OK)
                    {
                      /* the data record location is e.recptr */
                      /* the program would retrieve and process it */
                      ret = next_key(&e, &names);
                    }

                  /* remember to always close open files */
                  close_index(&names);
                }

                                   Page 7


         HUNTER AND ASSOCIATES            B-PLUS FILE INDEXING PROGRAM
         -------------------------------------------------------------


         int cdecl prev_key(pe, pix);

              ENTRY *pe;          Pointer to variable of type ENTRY
              IX_DESC *pix;       Pointer to index file variable

              Description:  The prev_key function returns the previous
              entry in the index file.  After deleting or adding keys,
              prev_key returns the key immediately preceeding the
              addition or deletion.  Prev_key can be used to process
              files sequentially in reverse order. Prev_key returns
              the value IX_OK (1) if there is a previous key and the
              value EOIX (-2) when the file pointer is at the
              beginning of the index file.



         int cdecl find_exact(pe, pix);

              ENTRY *pe;          Pointer to variable of type ENTRY
              IX_DESC *pix;       Pointer to index file variable

              Description:  The find_exact function searches the index
              file for the key value contained in pe.key and the data
              record position stored in pe.recptr.  If an exact match
              is found for both of these values, the value IX_OK (1)
              is returned, and the internal index file pointer is set
              to that position in the index file.  If an exact match
              is not found, the value IX_FAIL (0) is returned.



         TAILORING OR CHANGING B-PLUS

              Most applications can use the B-PLUS program as it is
         distributed by Hunter and Associates without any changes.  It
         allows multiple, large data files to be indexed in a fast,
         efficient manner.  However, a description of the values that
         can be changed to tailor B-PLUS are given in this section.

              An index tree becomes full when no more entries can be
         added to the tree.  This is the case when adding another
         entry to the tree would cause the tree to grow larger than
         its maximum allowed height.  This height depends on the size
         of the index blocks and the average size of the keys used by
         the data files.  The minimum capacity of a b-tree index is
         given by the following formula:

              C = (B / E + 1) * (B / (2 * E)  + 1) ** (H - 1)

         where C is the number of entries in the index file, B is the

                                   Page 8


         HUNTER AND ASSOCIATES            B-PLUS FILE INDEXING PROGRAM
         -------------------------------------------------------------


         block size in bytes, E is the average size of an ENTRY in
         bytes, and H is the maximum tree height.

              The maximum key size is defined by MAXKEY and is set at
         100.  The block size is 1024 bytes as defined by IXB_SIZE.
         Ten bytes are used by pointers so 1014 bytes are used by
         entries.  The size of an ENTRY is 9 + the key length.

              Thus, if the average key length is 11, an average ENTRY
         is 20 bytes long and the minimum number of entries that can
         be contained in a tree of height 4 is:

              C = (1014 / 20 + 1) * (1014 / 40 + 1) ** 3
                = 945,072

         If the average ENTRY is 40 bytes long, the minimum number of
         entries that can be contained in a tree of height 4 is
         67,384.  The corresponding values for a tree of height 3 are
         35,896 and 4927, respectively.

              The maximum tree height is determined by MAX_LEVELS and
         is set to eight.  Very little memory space is used by
         allowing the maximum tree height to be this large.  B-PLUS
         increases the height of the tree dynamically as required by
         the number of records in the data file.

              If your application does not use long keys and your
         files are not huge, IXB_SIZE can be changed to 512 bytes with
         only a slight degradation in performance.

              The root of an open index file is always memory resident
         as defined by the variable of type IX_DESC.  A dynamic pool
         of cache buffers is used for other index blocks of the tree.
         The number of blocks in the pool is defined by NUM_BUFS with
         a default value of 8.  Each memory block is sizeof(IXB_SIZE)
         + 8 bytes in length so approximately 8K of memory is used for
         cache storage of index blocks.  If the number of index files
         that are open simultaneously is larger than 4, you may want
         to increase NUM_BUFS.  If the usage of memory is critical,
         the number of buffers can be decreased.  However, NUM_BUFS
         must be at least 2.  In general, the speed of file access can
         be expected to improve if the number of buffers is increased
         since more of the index file is memory resident.

              Because some indices are always memory resident, and
         because the DOS Operating System requires it, it is very
         important that all open index files be closed before an
         application program terminates.

              As stated earlier, the B-PLUS program has been tested

                                   Page 9


         HUNTER AND ASSOCIATES            B-PLUS FILE INDEXING PROGRAM
         -------------------------------------------------------------


         using Microsoft's Optimizing C Compilers, Versions 4, 4.5 and
         5.0, and Borland's Turbo C, Version 1.0.  However, standard K
         & R programming guidelines are followed so B-PLUS should be
         able to be used with other C Compilers with little
         modification.  Since B-PLUS is non-recursive, the usage of
         stack space does not change dynamically.  It is recommend
         that the B-PLUS program be complied without stack checking.
         For Microsoft C, the /Ox option can be used to maximize speed
         and minimize code size.  For Turbo C, B-PLUS can be complied
         with maximum optimization to minimize the object size and
         improve performance.


         ACKNOWLEDGMENTS AND REFERENCES

              The following reference materials were used and helpful
         in writing the B-PLUS program:

              Wirth, Niklaus:
                     Algorithms + Data Structures = Programs
                     Prentice Hall (1976)

              Hunt, William James:
                     The C Toolbox
                     (Serious C Programming for the IBM PC)
                     Addison-Wesley (1985)


              Wirth's book is a standard reference source for binary
         trees and data structures.  Hunt's C Toolbox contains useful
         C programming concepts with carefully constructed programming
         examples.


         USING THE BPLUS ROUTINES

              The BPLUS.C routines must be compiled and the object
         file (BPLUS.OBJ) loaded with programs that use the B_PLUS
         toolkit.  Several sample C programs have been included which
         illustrate how to use the BPLUS Toolkit.  These programs
         should be compiled and run to make sure your copy of the
         program is correct.


         A SPECIAL NOTE REGARDING MICROSOFT C 4.0 COMPILER

              The Microsoft C library routines are different for
         Version 4.0 and for Version 5.0 and Quick C.  In particular,
         the memmove routine must be changed to the memcpy routine in
         Version 4.0.  A macro is included in BPLUS.C which makes this
         substitution.  Remove the comments delimiters for this version.

                                  Page 10