Qgdbm Package for tclgdbm

Introduction | Commands | Examples | Performance

Introduction

The package Qgdbm is wrapped around Tclgdbm and provides a more convenient way to store and retrieve data. It is however not a replacement for a "professional" database. But serves well for applications which simply have to store a small/medium amount of data which fits into a simple table-structure.

With Qgdbm you can access information from gdbm-files in an SQL-like fashion. Not really SQL, but a more Tcl-adopted SQL, let's name it TSQL for (Tcl or Tiny(?)-SQL).

Qgdbm stores each table in a separate file named <table-name>.qg. All tables are stored in a specified root-directory.
All Qgdbm-Table-files have one reserved key/value-pair with key "@". This entry is used to store administrative information like table-structure, number of inserts, date of creation, ...
Because all needed information about one table is stored in the table-file itself it is easy to copy this file elsewhere, without having to worry about system-information or loss of information.
You simply copy the table-file you need, you can even copy into another user-directory (see below).

Qgdbm may be run in two modes.
Without (the default) system-tables or with system-tables.

The system-tables (or exactly in version 0.3 the one system-table named system.qg) holds information about users. You can create tables which belong to a specified user (just like in "normal" databases). In this case no rights are given to different users (maybe this will come in one of the next versions of Qgdbm), so the paradigma is everyone sees everything.

Each user has his own directory (directly under the root-directory). Tables could be specified as <user>.<table>, like the way you access a different scheme (e.g. in Oracle).
Managing users is only possible in the "With-System"-Option.

Properties

Common features of Qgdbm are: Features which are not realized in Qgdbm:

Let's take a look at some simple examples:

1. Simple Example

package require qgdbm
qgdbm::init -rootdir /tmp/

qgdbm::tsql create table address \
  {{name char} {street char} {zip_code integer} {city char}}

qgdbm::tsql insert into address {$name $street} values \
    [list {Grobi Sesame-Street}]
qgdbm::tsql insert into address values \
    [list {Ernie Sesame-Street 12234 Sesame}]

set result [qgdbm::tsql select * from address]
qgdbm::cleanup
This simple script creates a table address (which is stored in the file <rootdir-parameter>/address.qg) and inserts the value "Grobi". The variable result has the value: {Grobi Sesame-Street {} {}}.

2. Simple Example

package require qgdbm
qgdbm::init -rootdir /tmp/ -system 1

qgdbm::tsql create user grobi identified by grobis_password
qgdbm::tsql create table grobi.cookies \
    {{id integer} {name char} {amount integer}}

qgdbm::cleanup
In this example the Qgdbm-package is initialized with the system-option. With this option we are able to create users, and tables which belong to a specific user with the notation <username>.<table> (e.g.: grobi.cookies).

Commands

These commands are available in Qgdbm:

Some further helpful commands: The following conventions are made:
<table> The name for a table (can contain only characters and numbers [a-zA-Z0-9_]) and is either user-unspecific (e.g. address) or in a user-scheme (e.g. ernie.address).
Because the tablename directly maps to a filename, the tablename is converted to lower characters (no conflicts between Windows or Unix filename-conventions).
<coln> Name of the n-th column in a table. Valid characters are [a-zA-Z0-9_].
The columns are case-sensitive. But I suggest using either uppercase or lowercase columnnames, because you have to write the corresponding column-variables exactly as you defined them.
<dattypn> The (n-th) datatype of a column.
One of the following possible datatypes for a column:
char - strings of arbitrary-length, where arbitrary really means arbitrary. I didn't check for a maximum length. Is there a limit for Tcl?
integer - not to explain
real - floating-point numbers (e.g.: 3.141)
date - the internal tcl-representation of date/time, that is: integer (e.g.: [clock seconds])
Datatype are case insensitive, so you can even mix upper- and lowercase in create- or alter -table statements.
These datatypes are used when a sorting-column is provided in a select-statement (the datatype is used to pass it to the tcl lsort command to provide the right sorting-criterium). Or when inserting or updating is done.
<conn> A constraint for the n-th column.
A valid tcl-expression which is used to check if a value inserted or updated is valid. Be careful with quoting when specifying constraints.
<username>, <password> The name of a user (valid characters are [a-zA-Z0-9_]) and password (all characters are valid). The password is stored unencrypted in the database. This is not a safety-leak, because we use "everyone sees everything", anyhow.
The username is used to create a directory directly under root-dir. To avoid conflicts in using Qgdbm under Windows/Unix, ... the username is converted to lower characters. So the username is case-insensitiv.
where <expr> The where-condition in Qgdbm is realized via the Tcl-expression mechanism. The columns are specified like "normal" values of Tcl-Variables. If you have the table address (see Simple Example), you could search for an entry like this:
qgdbm::tsql select * from address where {"$name" == "Ernie"}
In the where-expression you can almost specify anything (even your own functions), e.g.:
qgdbm::tsql select * from address where {[string match "Washington" $city] || ("[string tolower $name]" == "clown")}.
Be sure to do the quoting right, because the where-expression is first substituted (to fill in the variable-values you specified) and then send to expr.
pklist <pklist> In most cases you would like to access the data stored in Qgdbm with the primary key (this must always be the first column you specified in the create table-statement). To simplify the parsing of tsql-commands you have to give these primary keys explizitly to qgdbm::tsql. This is done with the parameter pklist.
When you do not tell qgdbm:tsql the keys you want to search for with the parameter pklist a full-table-scan is done. pklist is a "normal" tcl-list, e.g.
qgdbm::tsql select * from address pklist {Ernie Clown}

If both pklist and where-expr are given to qgdb::tsql [select|delete|update] the given list of primary keys are searched for a matching where-expr. That means pklist and where-expr are combined via AND.
<Null> This is not the normal database NULL-value (or not value). This is simply an empty string. There is no distinction between no-value and empty-value.
In all the examples the reference-table is the address-table

qgdbm::init/qgdbm::cleanup (Initialization/Cleanup)

Syntax: qgdbm::init
-root <rootname> Default system
-rootdir <root-directory> Default ""
-hd <header-key> Default @
-dbext <database-file-extension> Default qd
-system <with-system-or-not> Default 0
-log <with-loggin-or-not> Default 0
-reorganize <number-of-deleted-records> Default 300
-help


qgdbm::cleanup

Return-Value: When qgdbm::init is called with -help the values of all parameters are returned.
Description: With qgdbm::init the Qgdbm-Database is initialized. If you forget the call to qgdbm::init strange errors might happen.
The parameters have the following meaning:
-dbext: The Qgdbm-database-file-extension. The default is ".qg". This should be changed only for a very good reason.
-root: The name of the root-table (if specified with -system 1; useless otherwise) without extension (-dbext). Not to be changed, too.
-rootdir: The place of the database. Under this directory the system-table-file and the user-directories will be created. With this option you can have different databases laying around. With a simple call to qgdbm::init you switch to another database.
-hd: The key of the admin-row in each table. The default is @. In case you need a key with this symbol you may set the header-key to something different. But remember you have to do this for the whole database and each time you initialize it or else nothing will work. It is a good idea to leave this value untouched.
-system: 0 or 1. With -system 1 a system-table is created (if it doesn't exists). If you want to create different users you have to initialize with -system 1. Else (the default -system 0) no users could be created and all tables are created directly under rootdir.
-log: 0 (default) or 1. If Qgdbm gets initialized with -log 1 a log-table named "log.qg" is created. Nearly all commands select, delete, ... will leave an entry in this log-table. This option is provided for debugging-purposes and should not be used unless you really want to slow down the database-operations.
-reorganize 300 (default). When data is deleted from gdbm the space is not freed until a call to reorganize is made. With this option you can configure when to reoganize the table-files. It defaults to 300 (that is after 300 deletes a call to reorganize is automatically made, thereby shrinking the file-size to the minimum needed). Depending on the size of data-rows and the file-size you are willing to accept this can be set to a higher or lower value. With -reorganize 0 or -reorganize "" no automatically reorganization is done.

Be sure to call qgdbm::cleanup when you are finished with your database-operations, because Qgdbm keeps the table-files opened unless told to close them (with cleanup). All the opened gdbm-handles are closed and the lock on the tables are removed.

qgdbm::tsql ALTER TABLE

Syntax: qgdbm::tsql alter table <table> add \
    {{<col1> <dattyp1> [<con1>]} {<col2> <dattyp2> [<con2>]} ...}
qgdbm::tsql alter table <table> modify \
    {{<col1> <dattyp1> [<con_1>]} {...} ...}
qgdbm::tsql alter table drop {<col1> <col2> ...}
qgdbm::tsql alter table rename {<col1> <col1_newvalue>} {<col2>...}...}
Return-Value: None
Description: These commands are used to modify the columns of a table. With drop all the named columns are removed (including all data for these columns!). You cannot remove the primary key-column!
modify, changes the datatypes and constraints of the given columns. A check is made if all values are valid in regard for the new datatype and constraint. If not an error is thrown.
With add columns can be added to the table. <Null>-Values are inserted for these columns.
rename renames the columnnames of the listed columns to the given new names.
Example: qgdbm::tsql alter table address add column [list {houseno char}]
alter table address drop column city

qgdbm::tsql ALTER USER

Syntax: qgdbm::tsql alter user <username> identified by <password>
Return-Value: None
Description: Change the password of user <username>. If the user does not exist, an error is thrown.
Example: qgdbm::tsql alter user grobi identified by grobis_new_password

qgdbm::tsql CREATE TABLE

Syntax: qgdbm::tsql create table <table> \
    {{<col1> <dattyp1> [<con1>]} {<col2> <dattyp2> [<con2>]} ...}
Return-Value: None
Description: Create a table with the given columns and their corresponding column-datatypes and constraints. Constraints are optional. But remember the first column is always the primary key and this column is never optional.
A constraint is defined like the where-expr in a select-statement. The corresponding column is provided in a variable with the name of the column.
When an error occurs while inserting multiple rows at one time no row will be inserted.
When you specify a row with a primary key that's already in the database an error is thrown and no value is inserted.
When a table is created a file is generated in the root-directory or in the specified user-directoy named "<table>.qg". (See also description of <table>)
You are searching for an aequivalent to the standard SQL:
create table xyz as select * from abc (that is create a new table xyz which has exactly the same structure and the same data as abc)?
You don't need a tsql-statement for that. Simply copy the file abc.qg to xyz.qg. That's one of the benefits of using one file for each table.
Rename a table? Simply rename the corresponding file. But be sure to have closen your Tcl-application that uses this table!
Example: To create a column with a NOT NULL-Constraint is simple:
qgdbm::tsql create table person \
    {{persnr integer} \
    {last_name char {[string length $last_name]}} \
    {first_name char} \
    {salary integer}}

Be sure to quote the constraint correctly, because the constraint is substituted with subst before it is evaluated with expr.
For further examples see Examples.

qgdbm::tsql CREATE USER

Syntax: qgdbm::tsql create user <username> identified by <password>
Return-Value: None
Description: Create the user named <username>. This command is only available when Qgdbm is initialized with the option -system 1, because the user-information is stored in the system-table system.qg in the root-directory.
Example: See the Simple Example.

qgdbm::tsql DROP TABLE

Syntax: qgdbm::tsql drop table <table>
Return-Value: None
Description: The file corresponding to table <table> will be deleted. An error will be thrown if there is no table <table>.
Example: qgdbm::tsql drop table grobi.cookies

qgdbm::tsql DROP USER

Syntax: qgdbm::tsql drop user <username>
Return-Value:
Description: Delete the directory corresponding to this user and all her tables below. You should be really sure, if you want to do this.  :-)
Example: qgdbm::tsql drop user grobi

qgdbm::tsql DELETE

Syntax: qgdbm::tsql delete from <table> [where <expr>] [pklist <pklist>]
Return-Value: None
Description: Deletes the specified rows from table <table>. The where-expr and the pklist are optional.
If no where-condition or pklist is specified all rows will be deleted.
If both are given, the given list of primary keys (pklist) will be searched for a matching where-condition
Example: qgdbm::tsql delete from grobi.cookies deletes everything

qgdbm::tsql delete from address where {"$street" == "sesame-street"}
or better because the quoting is less confusing:
qgdbm::tsql delete from address \
    where {[string equal $street "sesame-street"}

qgdbm::tsql delete from address pklist {Ernie Clown}

qgdbm::tsql INSERT

Syntax: qgdbm::tsql insert into <table> {<col1> <col2> ...} \
    values {{<val11> >val12> ...} {<val21> <val22> ...} ...}
qgdbm::tsql insert into <table> \
    values {{<val11> <val12> ...} {<val21> <val22> ...} ...}
Return-Value: None
Description: If you insert values in the Qgdbm-Database be sure to always give the primary key-column (the first column in the table-definition). The columns are specified as usual via the "value"-notation with a $-prefix.
(You could alternatively give the columns without $; The $-prefix is only for consistency)
If you do not call insert with the column-parameter, you have to give values for all columns of the table or else an error is thrown.
The values are provided as a list of lists. Therefore you can use the result of a select-statement directly to insert. This implies that you have to make an additional list, if you want to insert only one row.
Example: See this Simple Example.
qgdm::tsql insert into address_new {$name $street} \
    values [qgdbm::tsql select {$name $street} from address]

qgdbm::tsql SELECT

Syntax: qgdbm::tsql select {<col1> <col2> ...} from <table> \
    [where <expr>] [pklist <pklist>] \
    [order_asc|order_desc <coln>]
qgdbm::tsql select * from <table> \
    [where <expr>] [pklist <pklist>] \
    [order_asc|order_desc <coln>]
Return-Value: List of selected and possibly ordered values
Description: Select all columns of all rows which are given in col1 col2... (or all if *) and which match the where-expr and/or pklist.
The where and pklist are optional.
Columns have to be provided in $-notation.
You are not limited to specify column-names in the select-statement as in "real" SQL you could also provide constants (see Examples). The result is a list of lists (that is a list of rows with a list of column-values).
Be sure to use the parameter pklist as often as possible, because without this parameter always a full-table-scan is done.
If the order-criterium is provided the values would be order by the given column ascending or descending. Only one column could be provided.
The $-prefix of the order-column is optional. In fact the $ is removed from the order-column, but the syntax is provided for consistency with the other columns.
Example: See the example-section.
Select the system-date and the names of table address with lowercase-characters:
qgdbm::tsql select {[string toupper $name] [clock seconds]} from address
Want a row-counter in the selected fields? Simply put your code into the select-statement:
set i 0
qgdbm::tsql select {[uplevel #0 {incr i}] $last_name,$first_name} \
    from person

But be careful not to mess with the column-variables.

qgdbm::tsql UPDATE

Syntax: qgdbm::tsql update <table> {<col1> <col2>...} {<val1> <val2>...} \
    [where <expr>] [pklist <pklist>]
Return-Value: None
Description: Update the given columns of the affected rows with specified values. The affected rows are determined the same way as in the select or delete-statement.
Keys given in pklist, which are not found in the table are silently ignored.
The given columns (again given with $) are updated with their corresponding values.
For the update-statement you need not specify the columns prefixed with $. The $-notation is only provided for conistency. For convenience you could give the column-names only (this avoids a little bit of quoting, too).
Example: See example-section.

qgdbm::descTable

Syntax: qgdbm::descTable <table>
Return-Value: None
Description: This prints a description of the given table to stdout equal to a describe <tablename> in Oracle-SQL.
This is provided as a convenience-function, so you don't have to mess with the table-header stored in key @.
The output (see example) gives the current working directory (this is needed for the determination of the table-file), the associated gdbm-handle, the corresponding database file (joined with the rootdir specified in the initialization-call qgdbm::init), the primary key (PK) and the other fields with its names (the datatype) and possible constraints.
Furthermore the number of entries is given (Size), remember that the header element is always counted, so the actual number of entries is (size - 1).
The date of creation of this table is printed with some primitive statistics of the table: number of insert, selects, updates and deletes.
Example:
% qgdbm::descTable address
Current working directory: /usr/svogel/tclgdbm/tests
Current gdbm-handle      : gdbm28
DBFile : address.qg (Version: 0.3)
PK     : name (char)
Fields : street (char)
Fields : zip_code (integer)
Fields : city (char) constraints: '[string length $city]'
Size   : 4
Created: 02/12/00 19:09:10

Statistics:
No insert: 3
No select: 9
No update: 0
No delete: 0

qgdbm::gdbmHandle

Syntax: qgdbm::gdbmHandle <table>
Return-Value: Corresponding gdbm-handle to the given table
Description: In case you want to access the table-file directly with the commands from Tclgdbm, you can retrieve the gdbm-handle with this command.
Example: To determine the number of entries:
set size [[qgdbm::qgdbmHandle address] count]
Remember that this counts the header-key @. So the real size of data entries is [expr $size -1]

qgdbm::forceReorg

Syntax: qgdbm::forceReorg <table>
Return-Value: None
Description: If you want to do the reorganization of gdbm-files by hand (specifying -reorganize 0 in the call to qgdbm::init), you may force immediate reorganization by calling forceReorg.
Example: qgdbm::forceReorg address

qgdbm::headerField

Syntax: qgdbm::headerField <table> <field>
Return-Value: Value of field <field> in header of <table>
Description: To access the system-information stored in each table you can read the different fields stored in key @ with this command.
The currently defined fields are:
version: Versionnumber of Qgdbm (should be 0.3).
createdate: The date of creation of the file as the internal tcl-clock-value.
fields: A list of field/column-names where the first listelement is the primary key.
types: A list of datatypes of the columns in the same order as in fields.
constrs: A list of constraints for the columns.
no_insert: The number of inserts.
no_update: The number of updates.
no_delete: The number of deletes.
no_update: The number of updates.
Example: qgdbm::headerField address fields
returns: name street zip_code city

qgdbm::log

Syntax: qgdbm::log command table time number_of_rows
Return-Value: None
Description: This command only works when qgdbm::init -log 1 is called for initialization.
log is used internally for time-measurement but you can use it for logging-purposes, too. The format of the log-table is:
id: primary key, a simple id (which is calculated in log.
command: The column to hold the command-string (or any other string you provide as command).
table: The table name for which the command is executed.
time: The time in microseconds (as returned by time) needed for command.
number_of_rows: The number of rows affected by this command.
You can use this command to log your own messages. But because logging is only enabled with -log 1 there is a lot of overhead done for logging, which slows down Qgdbm. So defining your own log-table and your own log-command would be much better.
Example: qgdbm::log select address 0 1

Examples

Example #1: Create a table

Assume we want the following table-specification for address-data:
ColumnnameDatatype 
namecharprimary key
streetchar 
zip_codeinteger 
citycharNOT-NULL
Not that it is a good idea to assume that the zip_code is a number, but anyway.
The corresponding command would be:

qgdbm::tsql create table address \
    {{name char} \
     {street char} \
     {zip_code integer} \
     {city char {[string length $city]}}}

Example #2: Insert values into a table

We assume the following data to be in this table:
namestreetzip_codecity
Erniesesame-street12345Sesame
ClownWhitehouse-Av.9999Washington
QuichoteWindmillstreet455Don't know where

These could be inserted in the table address as follows:
set data {Ernie    sesame-street  12345 Sesame
          Clown  Whitehouse-Av. 9999  Washington
          Quichote Windmillstreet 455   {Don't know where}}

foreach {name street zip city} $data {
  qgdbm::tsql insert into address {$name $street $zip_code $city} \
    values [list [list $name $street $zip $city]]
}
or if you have your data in another datastructure:
set data {{Ernie    sesame-street  12345 Sesame}
          {Clown    Whitehouse-Av. 9999  Washington}
          {Quichote Windmillstreet 455   {Don't know where}}}
qgdbm::tsql insert into address values $data
You can also insert specific columns as in:
qgdbm::tsql insert into address {$name $city} values {{Bert Sesame}}
Would insert: {Bert {} {} Sesame}. Whereas
qgdbm::tsql insert into address {$name $street} values \
    {{Bert sesame-street}}
would throw an error:
Column 'city' with value '' doesn't fulfill constraint
'[string length $city]'.

Example #3: Select values from a table

  • To select entries from this table you can use the following commands:

    qgdbm::tsql select {$name $street} from address
    would result in (no specific order):
    {{Ernie sesame-street} {Quichote Windmillstreet} {Clown Whitehouse-Av.}}

  • Select with an order-clause:
    qgdbm::tsql select * from address order_asc {$name}
    result is:
    {Clown Whitehouse-Av. 9999 Washington} {Ernie sesame-street 12345 Sesame} {Quichote Windmillstreet 455 {Don't know where}}

  • Select with an order-clause and where-condition:
    qgdbm::tsql select * from address where {"$city" == "Washington"} \
       order_desc {$city}
    results in:
    {Clown Whitehouse-Av. 9999 Washington}

  • Another select with where and order that fails:
    qgdbm::tsql select {$name} from address where {$zip_code > 5000} \
        order_asc {$city}

    would throw an error, because the column in the order-clause has to be in the selected-column list.

  • Here is a better way:
    qgdbm::tsql select {$name $city} from address where {$zip_code > 5000} \
        order_desc city
    (the $-prefix in the order-column is optional!)
    result:
    {Clown Washington} {Ernie Sesame}

  • Select using pklist:
    qgdbm::tsql select {$street} from address pklist {Ernie Clown}
    returns:
    sesame-street Whitehouse-Av.

    Example #4: Update values from a table

  • To update some rows use the update-command as in these examples:
    qgdbm::tsql update address {$street} Sesamestreet \
        where {"$name" == "Ernie"}
    qgdbm::tsql update address {$street} Sesamestreet pklist {Ernie Bert}

  • This will result in an error, because the primary key could not be updated:
    qgdbm::tsql update address {$name} Hugo pklist Ernie

  • The following statement updates all rows in address:
    qgdbm::tsql update address {$city} Berlin

    Performance

    You may ask it Qgdbm fast enough, because it is based on Tcl. The answer is that depends (as always). Comparing it with a "real" database is a bit unfair, but let's compare it with pure Tclgdbm. Because Qgdbm is based on Tclgdbm, we can see the overhead added to Tclgdbm to make it easier to use.
    The times are taken under Linux on a Intel Pentium 133-Computer with Tcl 8.2.
    Performance Insert
    Figure 1: Measured time for Inserts in Tclgdbm and Qgdbm
    For the first test 1000 rows were inserted. Each row was about 250, 500 or 1000 Bytes (which makes a total of nearly 250, 500, 1000 KB, to be exact 244, 488, 976 KB).
    Inserting nearly 1 MB with Qgdbm (and 1000 qgdbm::tsql insert ...-operations) is slow, it takes 8.4 seconds. Inserting these values with one insert is much faster (4 secs). But surely Tclgdbm cannot be beaten, it takes 0.7 seconds.
    Qgdbm is around 10-15 times slower than Tclgdbm.
    In fact, inserting 1 MB in one chunk is not really an all-day situation. Qgdbm is surely not a tool for time-critical applications, but in applications with user-interaction and a small/medium amount of data it is worth the overhead.
    Let's have a look at the time for selects.
    Performance Select
    Figure 2: Measured time for Selects in Tclgdbm and Qgdbm
    From the database-file with 1000 rows, we select 100 rows (Again each row has 250, 500 or 1000 Bytes (in total nearly 25, 50, 100 KB are returned from the select-statement).
    Again, Tclgdbm is really fast (under 50 milliseconds). When running a select-statement with a where-expression in Qgdbm the time is between 1 to 1.4 seconds. This is because in case of an unspecific where-expression a full-table-scan is done.
    Selecting the same values with the pklist-parameter makes Qgdbm much faster (between 90 and 160 milliseconds).
    By the way when doing a full-table-scan on a table with 500 rows (instead of 1000) the time is half as much (around 0.5 to 0.7 seconds).
    Whenever possible you should use the parameter pklist in the select-statement.
    Further you should limit your table-size to a maximum of 1000 rows.
    Last changed: 24. Feb 2000 21:00 (MET)
    Created: 08. Feb 2000

    For Suggestions, improvements, bugs or bug-fixes feel free to contact:

    Stefan Vogel (stefan.vogel@avinci.de)