.ds CP 2003-2004 .ds TC \'tconfpy\' .TH TCONFPY 3 "TundraWare Inc." .SH NAME tconfpy.py Configuration File Support For Python Applications .SH SYNOPSIS It is common to provide an external "configuration file" when writing sophisticated applications. This gives the end-user the ability to easily change program options by editing that file. \*(TC is a Python module for parsing such configuration files. \*(TC understands and parses a configuration "language" which has a rich set of string-substitution, variable name, conditional, and validation features. By using \*(TC, you unburden your program from the major responsibility of configuration file parsing and validation, while providing your users a rich set of configuration features. .SH DOCUMENT ORGANIZATION This document is divided into 4 major sections: .B PROGRAMMING USING THE \*(TC API discusses how to call the configuration file parser, the options available when doing this, and what the parser returns. This is the "Programmer's View" of the module and provides in-depth descriptions of the API, data structures, and options available to the programmer. .B CONFIGURATION LANGUAGE REFERENCE describes the syntax and semantics of the configuration language recognized by \*(TC. This is the "User's View" of the package, but both programmers and people writing configuration files will find this helpful. .B ADVANCED TOPICS describes some ways to combine the various \*(TC features to do some fairly nifty things. .B INSTALLATION explains how to install this package on various platforms. This information can also be found in the \'READ-1ST.txt\' file distributed with the package. .SH PROGRAMMING USING THE \*(TC API \*(TC is a Python module and thus available for use by any Python program. This section discusses how to invoke the \*(TC parser, the options available when doing so, and what the parser returns to the calling program. One small note is in order here. As a matter of coding style and brevity, the code examples here assume the following Python import syntax: .nf from tconfpy import * .fi If you prefer the more pedestrian: .nf import tconfpy .fi you will have to prepend all references to a \*(TC object with \'tconfpy.\'. So \'retval=ParseConfig(...\' becomes \'retval = tconfpy.ParseConfig(...\' and so on. You will also find the test driver code provided in the \*(TC package helpful as you read through the following sections. \'test-tc.py\' is a utility to help you learn and exercise the \*(TC API. Perusing the code therein is helpful as an example of the topics discussed below. .SS API Overview The \*(TC API consists of a single call. Only the configuration file to be processed is a required parameter, all the others are optional and default as described below: .nf from tconfpy import * retval = ParseConfig(cfgfile, InitialSymTbl={}, AllowNewVars=True, AllowNewNamespaces=True, Debug=False, LiteralVars=False ) where: .fi .TP .B cfgfile (Required Parameter - No Default) The the name of a file containing configuration information .TP .B InitialSymTbl (Default: {}) A pre-populated symbol table (a Python dictionary). As described below, this must contain valid \'VarDescriptor\' entries for each symbol in the table. .TP .B AllowNewVars (Default: True) Allow the user to create new variables in the configuration file. .TP .B AllowNewNamespaces (Default: True) Allow new namespaces to be created in the configuration file. .TP .B Debug (Default: False) If set to True, \*(TC will provide detailed debugging information about each line processed when it returns. .TP .B LiteralVars (Default: False) If set to True this option enables variable substitutions within \'.literal\' blocks of a configuration file. See the section in the language reference below on \'.literal\' usage for details. .TP .B retval An object of type \'tconfpy.RetObj\' used to return parsing results. .SS Reasons For Passing An Initial Symbol Table The simplest way to parse a configuration file is just to call the parser with the name of that file: .nf retval = ParseConfig("myconfigfile") .fi Assuming your configuration file is valid, \'ParseConfig()\' will return a symbol table populated with all the variables defined in the file and their associated values. This symbol table will have .B only the symbols defined in that file (plus a few built-in and pre-defined symbols needed internally by \*(TC). However, the API provides a way for you to pass a "primed" symbol table to the parser that contains pre-defined symbols/values of your own choosing. Why on earth would you want to do this? There are a number of reasons: .nf 1) You may wish to write a configuration file which somehow depends on a pre-defined variable that only the calling program can know: .if [APPVERSION] == 1.0 # Set configuration for older application releases .else # Set configuration for newer releases .endif In this example, only the calling application can know its own version, so it sets the variable APPVERSION in a symbol table which is passed to \'ParseConfig()\'. 2) You may wish to "protect" certain variable names be creating them ahead of time and marking them as "Read Only". This is useful when you want a variable to be available for use within a configuration file, but you do not want users to be able to change its value. In this case, the variable can be referenced in a string substitution or conditional test, but cannot be changed. 3) You may want to place limits on what values can be assigned to a particular variable. When a variable is newly defined in a a configuration file, it just defaults to being a string variable without any limits on its length or content. But variables that are created by a program have access to the variable's "descriptor". By setting various attribues of the variable descriptor you can control variable type, content, and range of values. In other words, you can have \*(TC "validate" what values the user assigns to particular variables. This substantially simplifies your application because no invalid variable value will ever be returned from the parser. .fi .SS How To Create An Initial Symbol Table A \*(TC "Symbol Table" is really nothing more than a Python dictionary. The key for each dictionary entry is the variable's name and the value is a \*(TC-specific object called a "variable descriptor". Creating new variables in the symbol table involves nothing more than this: .nf from tconfpy import * # Create an empty symbol table MySymTable = {} # Create descriptor for new variable MyVarDes = VarDescriptor() # Code to fiddle with descriptor contents goes here MyVarDes.Value = "MyVal" # Now load the variable into the symbol table MySymTable["MyVariableName"] = MyVarDes .fi The heart of this whole business the \'VarDescriptor\' object. It "describes" the value and properties of a variable. These descriptor objects have the following attributes and defaults: .nf VarDescriptor.Value = "" VarDescriptor.Writeable = True VarDescriptor.Type = TYPE_STRING VarDescriptor.Default = "" VarDescriptor.LegalVals = [] VarDescriptor.Min = None VarDescriptor.Max = None .fi When \*(TC encounters a new variable in a configuration file, it just instantiates one of these descriptor objects with these defaults for that variable. That is, variables newly-defined in a configuration file are entered into the symbol table as string types, with an initial value of "" and with no restriction on content or length. But, when you create variables under program control to "prime" an initial symbol table, you can modify the content of any of these attributes for each variable. These descriptor attributes are what \*(TC uses to validate subsequent attempts to change the variable's value in the configuration file. In other words, modifying a variable's descriptor tells \*(TC just what you'll accept as "legal" values for that variable. Each attribute has a specific role: .TP .B VarDescriptor.Value (Default: Empty String) Holds the current value for the variable. .TP .B VarDescriptor.Writeable (Default: True) Sets whether or not the user can change the variable's value. Setting this attribute to False makes the variable .B Read Only. .TP .B VarDescriptor.Type (Default: TYPE_STRING) One of TYPE_BOOL, TYPE_COMPLEX, TYPE_FLOAT, TYPE_INT, or TYPE_STRING. This defines the type of the variable. Each time \*(TC sees a value being assigned to a variable in the configuration file, it checks to see if that variable already exists in the symbol table. If it does, the parser checks the value being assigned and makes sure it matches the type declared for that variable. For example, suppose you did this when defining the variable, \'foo\': .nf VarDescriptor.Type = TYPE_INT .fi Now suppose the user puts this in the configuration file: .nf foo = bar .fi This will cause a type mismatch error because \'bar\' cannot be coerced into an integer type - it is a string. As a general matter, for existing variables, \*(TC attempts to coerce the right-hand-side of an assignment to the type declared for that variable. The least fussy operation here is when the variable is defined as TYPE_STRING because pretty much everything can be coerced into a string. For example, here is how \'foo = 3+8j\' is treated for different type declarations: .nf VarDescriptor.Type VarDescriptor.Value ------------------ ------------------- TYPE_BOOL Type Error TYPE_COMPLEX 3+8j (A complex number) TYPE_FLOAT Type Error TYPE_INT Type Error TYPE_STRING \'3+8j\' (A string) .fi This is why the default type for newly-defined variables in the configuration file is TYPE_STRING: they can accept pretty much .B any value. .TP .B VarDescriptor.Default (Default: Empty String) This is a place to store the default value for a given variable. When a variable is newly-defined in a configuration file, \*(TC places the first value assigned to that variable into this attribute. For variables already in the symbol table, \*TC does nothing to this attribute. This attribute is not actually used by \*(TC for anything. It is provided as a convenience so that the calling program can easily "reset" every variable to its default value if desired. .TP .B VarDescriptor.LegalVals (Default: []) Sometimes you want to limit a variable to a specific set of values. That's what this attribute is for. \'LegalVals\' explictly lists every legal value for the variable in question. If the list is empty,then this validation check is skipped. .B IMPORTANT: If you change the content of \'LegalVals\', make sure it is always a Python list. \*(TC's validation logic presumes this attribute to be a list and will blow up nicely if it is not. The exact semantics of LegalVals varies depending on the type of the variable. .nf Variable Type What LegalVals Does ------------- ------------------- Boolean Nothing - Ignored Integer, Float, Complex List of numeric values the user can assign to this variable Examples: [1, 2, 34] [3.14, 2.73, 6.023e23] [3.8-4j, 5+8j] String List of Python regular expressions. User must assign a value to this variable that matches at least one of these regular expressions. Example: [r'a+.*', r'^AnExactString$'] .fi The general semantic here is "If Legal Vals is not an empty list, the user must assign a value that matches one of the items in LegalVals." One special note applies to \'LegalVals\' for string variables. \*(TC always assumes that this list contains Python regular expressions. For validation, it grabs each entry in the list, attempts to compile it as a regex, and checks to see if the value the user wants to set matches. If you define an illegal regular expression here, \*(TC will catch it and produce an appropriate error. .TP .B VarDescriptor.Min and VarDescriptor.Max (Default: None) These set the minimum and maxium legal values for the variables, but the semantics vary by variable type: .nf Variable Type What Min/Max Do ------------- --------------- Boolean, Complex Nothing - Ignored Integer, Float Set Minimum/Maxium allowed values. String Set Minimum/Maximum string length .fi In all cases, if you want these tests skipped, set \'Min\' or \'Max\' to the Python None. .P All these various validations are logically "ANDed" together. i.e., A new value for a variable must be allowed AND of the appropriate type AND one of the legal values AND within the min/max range. \*(TC makes no attempt to harmonize these validation conditions with each other. If you specify a value in \'LegalVals\' that is, say, lower than allowed by \'Min\' you will always get an error when the user sets the variable to that value: It passed the \'LegalVals\' validation but failed it for \'Min\'. .SS Some Notes On Boolean Variables One last note here concerns Boolean variables. Booleans are actually stored in the symbol table as the Python values, True or False. However, \*(TC accepts user statements that set the value of the boolean in a number of formats: .nf Boolean True Boolean False ------------ ------------- foo = 1 foo = 0 foo = True foo = False foo = Yes foo = No foo = On foo = Off .fi This is the one case where \*(TC is insensitive to case - "tRUE", "TRUE", and "true" are all accepted, for example. .B NOTE HOWEVER: If the user wants to do a conditional test on the value of a boolean they .B must observe case and test for either \'True\' or \'False\': .nf .if [boolvar] != False # This works fine .if [boolvar] != FALSE # This does not work - Case is not being observed .if [boolvar] != Off # Neither does this - Only True and False can be tested .fi .SS How The \*(TC Parser Validates The Initial Symbol Table When you pass an initial symbol table to the parser, \*(TC does some basic validation that the table contents properly conform to the \'VarDescriptor\' format and generates error messages if it finds problems. However, the program does .B not check your specifications to see if they make sense. For instance if you define an integer with a minimum value of 100 and a maximum value of 50, \*(TC cheerfully accepts these limits even they they are impossible. You'll just be unable to do anything with that variable - any attempt to change its value will cause an error to be recorded. Similarly, if you put a value in \'LegalVals\' that is outside the range of \'Min\' to \'Max\', \*(TC will accept it quietly. .SS The \'AllowNewVars\' Option By default, \*(TC lets the user define any new variables they wish in a configuration file, merely by placing a line in the file in this form: .nf Varname = Value .fi However, you can disable this capability by calling the parser like this: .nf retval = ParseConfig("myconfigfile", AllowNewVars=False) .fi This means that the configuration file can "reference" any pre-defined variables, and even change their values (if they are not Read-Only), but it cannot create .B new variables. This feature is primarily intended for use when you pass an initial symbol table to the parser and you do not want any other variables defined by the user. Why? There are several possible uses for this option: .nf 1) You know every configuration variable name the calling program will use ahead of time. Disabling new variable names keeps the configuration file from getting cluttered with variables that the calling program will ignore anyway, thereby keeping the file more readable. 2) You want to insulate your user from silent errors caused by misspellings. Say your program looks for a configuration variable called \'MyEmail\' but the user enters something like \'myemail = foo@bar.com\'. \'MyEmail\' and \'myemail\' are entirely different variables and only the former is recognized by your calling program. By turning off new variable creation, the user's inadvertent misspelling of the desired variable name will be flagged as an error. .fi Note, however, that there is one big drawback to disabling new variable creation. \*(TC processes the configuration file on a line-by-line basis. No line "continuation" is supported. For really long variable values and ease of maintenance, it is sometimes helpful to create "intermediate" variables what hold temporary values used to construct a variable actually needed by the calling program. For example: .nf inter1 = Really, really, really, really, long argument #1 inter2 = Really, really, really, really, long argument #2 realvar = command [inter1] [inter2] .fi If you disable new variable creation you can't do this anymore unless all the variables \'inter1\', \'inter2\', and \'realvar\' are predefined in the initial symbol table passed to the parser. .SS The \'AllowNewNamespaces\' Option By default, \*(TC supports the use of an arbitrary number of lexical namespaces. They can be predefined in an initial symbol table passed to the parser and/or created in the configuration file as desired. (The details are described in a later section of this document.) There may be times, however, when you do not want users creating new namespaces on their own. The reasons are much the same as for preventing the creation of new variables in the option above: Maintaining simplicity and clarity in the configuration file and preventing "silent" errors due to misspellings. In this case, call the API with \'AllowNewNamespaces=False\' and the creation of new namespaces in the configuration file will be disabled. Any attempt to create a new namespaces via either the \'[new-ns-name]\' or \'NAMESPACE=new-ns-name\' methods will cause a parse error to be generated. It is important to realize that this option only disables the creation of .B new namespaces. As discussed in the later section on namespace processing, it is possible to pass an initial symbol table to the parser which has one or more pre-defined namespaces in it. Each of these pre-defined namespaces is available for use throughout the configuration file even if \'AllowNewNamespaces\' is set to False. .SS The \'Debug\' Option \*(TC has a fairly rich set of debugging features built into its parser. It can provide some detail about each line parsed as well as overall information about the parse. Be default, debugging is turned off. To enable debugging, merely set \'Debug=True' in the API call: .nf retval = ParseConfig("myconfigfile", Debug=True) .fi .SS The \'LiteralVars\' Option \*(TC supports the inclusion of literal text anywhere in a configuration file via the \'.literal\' directive. This directive effectively tells the \*(TC parser to pass every line it encounters "literally" until it sees a corresponding \'.endlinteral\' directive. By default, \*(TC does .B exactly this. However, \*(TC has very powerful variable substitution mechanisms. You may want to embed variable references in a "literal" block and have them replaced by \*(TC. Here is an example: .nf MyEmail = me@here.com # This defines variable MyEmail .literal printf("[MyEmail]"); /* A C Statement */ .endliteral .fi By default, \'ParseConfig()\' will leave everything within the \'.literal\'/\'.endliteral\' block unchanged. In our example, the string: .nf printf("[MyEmail]"); /* A C Statement */ .fi would be in the list of literals returned by \'ParseConfig()\'. However, we can ask \*(TC to do variable replacement .B within literal blocks by setting \'LiteralVars=True\' in the \'ParseConfig()\' call: .nf retval = ParseConfig("myconfigfile", LiteralVars=True) .fi In this example, \*(TC would return: .nf printf("me@here.com"); /* A C Statement */ .fi At first glance this seems only mildly useful, but it is actually very handy. As described later in this document, \*(TC has a rich set of conditional operators and string sustitution facilities. You can use these features along with literal block processing and variable substitution within those blocks. This effectively lets you use \*(TC as a preprocessor for .B any other language or text. .SS How \*(TC Processes Errors As a general matter, when \*(TC encounters an error in the configuration file currently being parsed, it does two things. First, it adds a descriptive error message into the list of errors returned to the calling program (see the next section). Secondly, in many cases, noteably during conditional processing, it sets the parser state so the block in which the error occurred is logically False. This does not happen in every case, however. If you are having problems with errors, enable the Debugging features of the package and look at the debug output. It provides detailed information about what caused the error and why. .SS \*(TC Return Values When \*(TC is finished processing the configuration file it returns an object which contains the entire results of the parse. This includes a symbol table, any relevant error or warning messages, debug information (if you requested this via the API), and any "literal" lines encountred in the configuration. The return object is an instance of the class \'twander.RetObj\' which is nothing more than a container to hold return data. In the simplest case, we can parse and extract results like this: .nf from tconfpy import * retval = ParseConfig("myconfigfile", Debug=True) .fi \'retval\' now contains the results of the parse: .nf retval.Errors - A Python list containing error messages. If this list is empty, you can infer that there were no parsing errors - i.e., The configuration file was OK. retval.Warnings - A Python list containing warning messages. These describe minor problems not fatal to the parse process, but that you really ought to clean up in the configuration file. retval.SymTable - A Python dictionary which lists all the defined symbols and their associated values. A "value" in this case is always an object of type tconfpy.VarDescriptor (as described above). retval.Literals - As described below, the \*(TC configuration language supports a \'.literal\' directive. This directive allows the user to embed literal text anywhere in the configuration file. This effectively makes \*(TC useful as a preprocessor for any other language or text. retval.Literals is a Python list containing all literal text discovered during the parse. retval.Debug - A Python list containing detailed debug information for each line parsed as well as some brief summary information about the parse. retval.Debug defaults to an empty list and is only populated if you set \'Debug=True\' in the API call that initiated the parse (as in the example above). .fi .SH CONFIGURATION LANGUAGE REFERENCE \*(TC recognizes a full-featured configuration language that includes variable creation and value assignment, a full preprocessor with conditionals, type and value enforcement, and lexical namespaces. This section of the document describes that language and provides examples of how each feature can be used. .SS \*(TC Configuration Language Syntax \*(TC supports a fairly simple and direct configuration language syntax: .IP \(bu 4 Each line is treated independently. There is no line "continuation". .IP \(bu 4 The \'#'\ can begin a comment anywhere on a line. This is done blindly. If you need to embed this symbol somewhere within a variable value, use the \'[HASH]\' variable reference. .IP \(bu 4 Whitespace is (mostly) insignificant. Leading and trailing whitespace is ignored, as is whitespace around comparison operators. However, there are some places where whitespace matters: .nf - Variable names may not contain whitespace - Directives must be followed by whitespace if they take other arguments. - When assigning a value to a string variable, whitespace within the value on the right-hand-side is preserved. Leading- and trailing whitespace around the right-hand-side of the assignment is ignored. .fi .IP \(bu 4 Case is always significant except when assigning a value to Booleans (described in the previous section). .IP \(bu 4 Regardless of a variable's type, all variable references return .B a string representation of the variable's value! This is done so that the variable's value can be used for comparison testing and string substitution/concatenation. In other words, variables are stored in their native type in the symbol table that is returned to the calling program, but they are treated as strings throughout the configuration file. .SS Variables And Variable References The heart of a configuration file is a "variable". Variables are stored in a "Symbol Table" which is returned to the calling program once the configuration file has been processed. The calling program can pre-define any variables it wishes before processing a configuration file. You can normally also define your own new variables in the configuration file as desired (unless the programmer has inhibited new variable creation). Variables are assigned values like this: .nf MyVariable = Some string of text .fi If \'MyVariable\' is a new variable, \*(TC will create it on the spot. If it already exists, \*(TC will first check and make sure that \'Some string of text\' is a legal value for this variable. If not, it will produce an error and refuse to change the current value of \'MyVariable\' You can get the value of any currently defined variable by "referencing" it like this: .nf .... [MyVariable] ... .fi The brackets surrounding any name are what indicate that you want that variable's value. You can also get the value of any Environment Variable on your system by naming the variable with a leading \'$\': .nf ... [$USER] ... # Gets the value of the USER environment variable .fi However you cannot set the value of an environment variable: .nf $USER = me # This is not permitted .fi Variables can be named pretty much anything you like, with certain restrictions: .IP \(bu 4 Variable names may not contain whitespace. .IP \(bu 4 Variable names may not begin with \'$\'. The one exception to this is when you are referencing the value of an environment variable as just described. .IP \(bu 4 Variable names cannot have the \'#\' character anywhere in them because \*(TC sees that character as the beginning a comment. .IP \(bu 4 You cannot have a variable with no name. This is illegal: .nf = String .fi .SS Predefined Variables .SS The \'.include\' Directive .SS Conditional Directives .SS \'.literal\. - Using \*(TC As A Preprocessor For Other Languages .SS Type And Value Enforcement .SS Lexical Namespaces .SH ADVANCED TOPICS .SS Guaranteeing A Correct Base Configuration .SS Enforcing Mandatory Configurations .SS Iterative Parsing .SH INSTALLATION There are three ways to install \*(TC depending on your preferences and type of system. In each of these installation methods you must be logged in with root authority on Unix-like systems or as the Administrator on Win32 systems. .SS Preparation - Getting And Extracting The Package For the first two installation methods, you must first download the latest release from: .nf http://www.tundraware.com/Software/tconfpy/ .fi Then unpack the contents by issuing the following command: .nf tar -xzvf py-tconfpy-X.XXX.tar.gz (where X.XXX is the version number) .fi Win32 users who do not have tar installed on their system can find a Windows version of the program at: .nf http://unxutils.sourceforge.net/ .fi .SS Install Method #1 - All Systems (Semi-Automated) Enter the directory created in the unpacking step above. Then issue the following command: .nf python setup.py install .fi This will install the \*(TC module and compile it. You will manually have to copy the 'test-tc.py' program to a directory somewhere in your executable path. Similarly, copy the documentation files to locations appropriate for your system. .SS Install Method #2 - All Systems (Manual) Enter the directory created in the unpacking step above. Then, manually copy the tconfpy.py file to a directory somewhere in your PYTHONPATH. The recommended location for Unix-like systems is: .nf .../pythonX.Y/site-packages .fi For Win32 systems, the recommended location is: .nf ...\\PythonX.Y\\lib\\site-packages .fi Where X.Y is the Python release number. You can pre-compile the \*(TC module by starting Python interactively and then issuing the command: .nf import tconfpy .fi Manually copy the 'test-tc.py' program to a directory somewhere in your executable path. Copy the documentation files to locations appropriate for your system. .SS Install Method #3 - FreeBSD Only (Fully-Automated) Make sure you are logged in as root, then: .nf cd /usr/ports/devel/py-tconfpy make install .fi This is a fully-automated install that puts both code and documentation where it belongs. After this command has completed you'll find the license agreement and all the documentation (in the various formats) in: .nf /usr/local/share/doc/py-tconfpy .fi The 'man' pages will have been properly installed so either of these commands will work: .nf man tconfpy man test-tc .fi .SS Bundling \*(TC With Your Own Programs If you write a program that depends on \*(TC you'll need to ensure that the end-users have it installed on their systems. There are two ways to do this: 1) Tell them to download and install the package as described above. This is not recommended since you cannot rely on the technical ability of end users to do this correctly. 2) Just include 'tconfpy.py' in your program distribution directory. This ensures that the module is available to your program regardless of what the end-user system has installed. .SH OTHER \*(TC requires Python 2.3 or later. .SH BUGS AND MISFEATURES None known as of this release. .SH COPYRIGHT AND LICENSING \*(TC is Copyright(c) \*(CP TundraWare Inc. For terms of use, see the tconfpy-license.txt file in the program distribution. If you install \*(TC on a FreeBSD system using the 'ports' mechanism, you will also find this file in /usr/local/share/doc/py-tconfpy. .SH AUTHOR .nf Tim Daneliuk tconfpy@tundraware.com