Robert A. Baron
Museum Computer Consultant

[Home Page]

The SWAP Project

Building A Museum Database from the Bottom Up[1]

Based on a pre-publication version of an article appearing in
The International Journal of Museum Management and Curatorship,
Vol. 8, No. 1 (March 1989), pp. 11-32.

Contents:

When it comes time to computerize their collections, registrars and curators of small museums commonly find themselves in a dilemma. On one hand, the features and capabilities offered by the newer commercial and professional collection control systems are more than they need, more than they can support, and more than they can afford. On the other hand, their own database management skills may be inadequate to address their problems effectively.[2]

For these embattled museum workers there has been little help. Discussions of databases for small collections rarely focus on the mechanics of building data systems. Database management tools successful in business may be inadequate for museum work. Questions of database structures, internal controls, and strategies useful for building small systems and entering data are often ignored at conferences and in papers. But these subjects, in their own right are as important as concern for vocabulary control or subject access, and must be addressed and understood if the difficulties of computerizing collection management and collection research are to be solved efficiently and skillfully.

* * *

This paper outlines the development of a simple object database designed for a medium sized university art museum. As with other small systems, this one had to use resident equipment and a prescribed database management system, and had to adhere to severe limitations of time and budget. Yet, in spite of these constraints, within eight months a significant portion of the database had been constructed, over 33,000 inventory records had been entered, a previous database had been adapted to fit a new environment, a set of nearly 140 standard prepared reports defined, and full documentation had been prepared. More important than the above, however, was the creation of a strategy for development. Following this method, required functions could be developed and specialized needs could be accommodated.

Although currently not in use, enough has been learned from the experience so that the system can be presented as a model that small museums may wish to consider as they plan their own automated programs. The results are of special interest to museums considering automating their collections management because they demonstrate how an inhospitable environment of old and inconsistent paper records easily yielded to simple database techniques, and how a complex and varied collection could be catalogued and indexed quickly, efficiently, and inexpensively.

Especially significant for small museums was the cost of the project. Today, all equipment, software and supplies may be purchased for about $6,000. Most museums will already own most of the required equipment, so the additional cost in most cases will be considerably less.[3]

The Method [outline]

For the present purpose, the database itself is less significant than the procedure used to build it--there will be no catalogue of data elements here. Rather, we shall see how data modeling and file definition, the design of entry and query utilities, and the entry of retrospective data became integral elements of a uniform development strategy.

The technique used has been dubbed "the bottom up approach" because its method of development is progressive and incremental. At the core of this system, and essential for all future growth, lies an authoritative object inventory. Each stage of development uses the inventory to produce a fully usable collections module. The strategy does not require that all current records be entered, and does not require most paper records and paper files entered be entered fully.

Furthermore, and for this writer essential, the process works with the museum's accumulated paper records as it tries to maintain the integrity of the data found in the paper records. That is, it supports an environment congenial to the needs of scholars and curators, as well as providing necessary functions for the administrator.

Most readers will be aware that in many automated systems, so often practical necessity leads to compromise. Consequently, the scholarly staff cannot trust the automated data available, while the registrar, to make his own data clear and concise, cannot accommodate the curator's need for accurate information. Object administrators commonly find themselves faced with the unwelcome prospect of rethinking traditional procedures, of keying in decades of outdated data, and, worst of all, of having to decide what to simplify, abridge, or omit.[4]

Database planners know the importance of controlling data structures--of assuring that the content of the data held in the system is in harmony with the structure of its files and the disposition of its fields. One may think of data as the vocabulary, the data structure among files as the syntax, and the rules which define their relations as the grammar of an artificial communication system. Whereas business and administrative databases emphasize the transactional life of their objects (e.g. sales, clients, inventories), fine arts databases must place greater emphasis upon the structure and definition of attributes assigned to their objects, this, in addition to whatever administrative transactions the database must monitor. In arts databases one should expect to find a synthesis of form and content: the form of a database reflects its function, and the way it operates provides meaning to its objects.

The customary way to initiate small-scale descriptive databases begins with field selection, not by modeling the museum's data. This process often ignores the influence that the database management system will exert on the structure and disposition of its information.[5] Of course, the fields chosen must satisfy the need for object description, for management functions, and for locating designated points of entry into the collection database, but they must be designed in concert with the logical structure of a database management system.[6]

No matter how intricate these in-house systems may become, the usual practice is to devote one file per logical data group: Object information in the object file, and so on. In these, data are input record by record--a process that sees the extant paper record as a fixed inviolate unit, and that assumes that the database model reflects the structure of the paper based files. This process may be said to work from the top, down.

The information system described here takes a totally different approach than this recognized standard (Fig.1). Instead of slicing the process of data entry horizontally, so that museum records are entered by object, this system slices the task vertically. Data are entered by type. The file structure is determined by and echoes the data entry method. It builds the database from the "bottom up." Rather than one large file burdened with the task of holding the mass of object data, in this system the object record is divided into linked groups of specialized files, each dedicated to the task of recording specific types of information. In the end, this process makes for faster queries, more efficient use of data storage resources, and a more supple system all around.

SWAP in Context [outline]

In this paper the program will be called SWAP, though that was not the name under which it ran.[7] SWAP was developed as an independent enterprise for a museum participating in the Getty Museum Prototype Project.[8] SWAP benefited indirectly from the technical assistance provided by the Getty Trust and by Willoughby Associates, their advisers. SWAP was written in Informix for MS-DOS, the database management system chosen for the Prototype Project.[9]

The collection for which SWAP was written is diverse. It holds the finds of archaeological expeditions; collections of figurines, glass and ceramic vessels, both ancient and modern; tableware, furniture, university portraits; Renaissance drawings and prints; Eastern manuscripts and decorative arts; and paintings from the middle ages to the present, and more. Some objects have been in custody of the university from its inception in the mid 18th century; many were collected as early as the nineteenth, but inventoried for the first time only in the early twentieth.

To create the database, the first obstacle was to overcome the inconsistencies of past cataloguing traditions. The internal museum catalogue was arranged by curatorial department and further broken down, hierarchically, but sometimes illogically, into sub-divisions defined by geographical origin, media, object type, stylistic and/or historical class. Some curatorial divisions were defined by culture (Far Eastern, Ancient, Medieval, European), and some by media or technique (Manuscripts, Prints, Drawings and Photography).

Within curatorial classifications, accession cards were further broken down into divisions defined by culture, media and object type. The card catalogue has developed organically through the years, and consequently manifests many inconsistent, non-parallel divisions. Whereas some areas are not partitioned at all, arranging all objects in accession order, others, the "Ancient" and "European" drawers, in particular, show many divisions. Compare the following examples with each other:

Class Level 1 Level 2 Level 3 Level 4
Ancient Ceramic Vessel Greek Geometric
Ancient Metal Mirror    
Ancient Metal Egyptian Gold  
European German Metal Vessel  

Most object cards were filed thematically, not in accession order.

The museum kept no shelf list or locations file, though it did have sets of manuscript ledgers in which all objects were assigned accession numbers. Unfortunately, these accession ledgers would rarely lead a researcher to the location of the accession card for an object.[10] For this reason, if for no other, the files needed an automated index of accession numbers and drawer locations.

Historically, several accessioning schemes were used to register objects. When the registration ledgers were begun in the 1920s, each object then in the collection was assigned an undated inventory number. Later acquisitions were labeled with the year of receipt, the sequence of accession, and a "parts" suffix. An identical parallel system was applied to the collection of works on paper. University portraits received their own identification numbers, following a published inventory. Any single accession number, therefore, could refer to several objects. Loans to the museum were registered with a similar system.

The above technical difficulties notwithstanding, the vitality of the museum made adoption of an advanced automated collections management system advisable. As an academic institution closely tied to the department of art history, unusual demands are placed upon the small museum staff. Objects from storage are frequently pulled out for class use. Each academic year sees more than twenty small didactic exhibitions of works culled from the collection. Visiting scholars are treated with great respect and hospitality, and given easy access to materials in storage. Moreover, hundreds, perhaps thousands of unaccessioned objects, kept in storage or on display, have been loaned to the museum by alumni and faculty. Many of the museum's own works are lent out, sometimes to departments and exhibitions on campus, sometimes to large travelling exhibitions. Objects continually move in and out of conservation. But with all this, there is no public listing of the collection contents. Researchers must be provided direct access to the accession cards; ordinary students and the public usually have no access.

Obviously, the casual structure of object classifications hindered efficient use of the collection catalogue, and made the chores of the administrative personnel particularly time-consuming and frustrating--especially since there was no locations file. The formidable loan program needed automation, but because the museum had just built a new wing, and was to move nearly every object from the old space to temporary storage in the new wing while the old areas were renovated, an object and packing inventory was the first priority.

SWAP Data Structure

The Inventory File [outline]

Because SWAP is a "bottom up" collections system, it will be described here in the order it was created. SWAP began conventionally enough as a simple accession number check-list prepared to monitor the move to the new wing. This list was made from the accession card catalogue (Fig. 2 ).

Into SWAP's inventory file was entered each object's accession number and filing classification--establishing the needed index to the accession drawers. With this simple file, one could query by any combination of classification tiers, by accession number, or by any of its components.

Borland's Superkey, a keyboard macro enhancer, was used to simplify the process of data entry (Fig. 3). A macro was designed to insert all repeating data, to send the database through its addition routine and to open a new record for the next object. Only those elements which changed from object to object were input by hand. Indexed fields, that always impede data entry, were kept to a minimum. Additional indices were added only after the basic inventory was complete. Normally fewer than six keystrokes were sufficient to make the system create a record. On a good day, working alone, I could enter one thousand objects.

The accession number was cast as a "composite" set of four fields: collection prefix, year of acquisition, acquisition sequence, and parts identifier. This compartmentalization sped entry, facilitated queries, and made sorting more convenient. To avoid a costly renumbering program, it was vital that the accession number serve as the key identifier of the object. The overlapping schemes were resolved with the addition of the collection prefix. In this way, one system could identify all standard acquisitions, loans to the museum, and any other category which may prove useful. In this system the accession number, in all its four parts, also serves as the link connecting the constituent database components and as a convenient point of entry into any file using it. File integrity is maintained by a process which looks to the accession number for authority to enter and change records.

Informix's "composite" field type contains no data of its own, only pointers to its components. Although its length is minimal, the composite name can be specified in any query in place of its component field names. This is handy, of course, but even more importantly, files can be joined on the composite, one composite can "lookup" another for verification, and composites can be indexed, either allowing or disallowing repetition of values.

Properly used, these features permit great control over the contents of a database, and simplify its operation. For example, the composite of accession number fields was used to supervise the process of recording them. By indexing the composite of the accession number, and by choosing the proper indexing strategy for each component of the accession number, the system guaranteed uniqueness and sped searches. It took only 10 to 20 seconds to pull one record out of 33,000 on a slow IBM XT.

The requirement of uniqueness did cause the entry process to be slowed considerably. When the paper records were incorrect, as they often were, entry stopped and the correct number had to be researched. Bothersome, to be sure, but the result of this effort was a highly accurate list of accession numbers, free from typographical errors, that would serve as the foundation on which to build the remainder of the database.

SWAP's inventory file used fewer than 120 characters per record, yet from this it was possible to produce important inventory documents: sorted lists by collection category, by object type, by accession year, by card-drawer order, and so on. Curators received their first-ever check-list of objects in their domain. Summary reports tallied objects by accession year and accession type. This modest flat-file database provided the museum with its first accurate count of its documented holdings in each area (Figs. 7, and 13 ).

Beyond the Inventory File [outline]

When the inventory file was complete, SWAP's second file was drawn out of the first (Fig. 7, top). Without additional data entry, each unique set of classification values was read into a master file. This file was turned into a "lookup" authority, against which any new or changed classification had to be tested before the computer would accept a new entry into the inventory file.[11]

Although the database now served as an authoritative inventory mechanism and index, at this stage it addressed only the two most important entry points for object designation: Accession Number and Object Classification. It contained none of the traditional fields for object queries or object description. In fact, there were no object descriptions at all, no makers, no subjects, no titles, no media, no size, no dates, no styles, no cultures, and no donors or valuations.

To include these elements, conventional wisdom for fixed-field database users would have the inventory file expanded by adding fields. Fields added in this way would be available even if they were not needed.[12] This procedure was not used. Instead, the inventory file was left as is, and turned into an authority file--the "lookup" file to which object records would refer (Figs. 7, bottom, 13).

Here is one of the practical benefits of the "bottom up" method: With the inventory file complete, there was no need to describe every object the museum owned. The thousands of accessioned objects of no current interest, never shown, never cited, never even described in the paper accession records, could be omitted from the object file--at least until descriptions were needed or available.[13]

The inventory file and the object file each prohibited duplicate accession numbers, so there could be only one object record per inventory record. However, since every object part was given its own unique entry in the inventory file, each part or accession subdivision could be counted as an independent entity. A sketch-book, or a portfolio of prints, for instance, might contain pictures by different artists. Each drawing or print could be described and attributed independently, or not, as desired, even as the item was described as a whole.[14]

Resource Management: Bernoulli Box Strategy [outline]

The entire SWAP system ran on modest equipment: an original IBM XT with a 10 megabyte hard disk, a Bernoulli Box with two removable 20 megabyte cartridges, and a cheap Epson printer.[15]

The use of interchangeable media was essential to the development plan of the database and offered a way to overcome some of its inherited constraints (Fig. 8). Using the Bernoulli Box, the database could be permitted to grow larger than the total online capacity. To do this, one Bernoulli disk was designated permanent. It held the large inventory and object files, and had to remain on line. The other disk was considered interchangeable, and might hold various administrative or descriptive files, plugged in as needed--manually.

Unlike some database management systems, such as dBase III or R:Base, SWAP's system, Informix, permits its data files to be distributed among device volumes. Furthermore, Informix does not require the entire database to be available. If a file is not called, it does not have to be on line. Thus donor, valuation, loans, specialized object description, notes, inscriptions, bibliography, provenance, whatever, could be available, or not, as determined by each task. Bernoulli disk management was assigned to the system interface.[16]

SWAP's master Query-Update-Entry screen for the object catalogue provides general access to its main object files and facilitates interactive "many-to-many" queries (Fig. 9). The form is crowded because all data elements are displayed on a single screen. It is not designed for rapid data entry, but, rather, for general system maintenance and ad hoc form queries, allowing the user to investigate the central portion of the catalogue and object/maker relationships.[17]

The 20 megabyte Bernoulli disk imposed severe restrictions upon the size of the object record, strictly limiting the list of possible fields. This limitation was turned into an asset. The fields selected had to be only those common to all objects and those required for the most fundamental queries. Coded fields annotated dates and other field types. Superkey "pop-up" help-screens, loaded by the interface, were used to enter codes and explain their significance. Data-form programming would display the meaning of important codes as they were entered or when they appeared in queries.[18] As a result, the object file was densely packed, with little waste, but not overloaded with inexplicably obscure codes.

Use of the Getty Prototype Project [outline]

The Getty Prototype Project had given the museum a database describing 900 of their paintings and 600 artists. SWAP used this data. However, since the Prototype data model was designed specifically for paintings, and the size of each object record had been allowed to approach the system limit, to fit 33,000, records, or more, onto 20 megabytes, some alteration was mandatory. SWAP's object file was begun by using only segments of the Prototype data. To fit the demands of SWAP's stipulated object record size of 350 characters, the object file stripped the inherited data of every nonessential field--in the end cutting the record size to less than 20 percent of the original.

This reduction should not be considered a loss as much as the first stage of a redistribution. Here, again, is where the Bernoulli System was used to good purpose. If the new object record had no space for acquisition information or inscriptions, this data was not deleted; rather, it would be attached to the object data via associated files located on the other Bernoulli disk. In fact, this new format would allow more flexible and efficient fielding. For instance, one could record multiple inscriptions per object, and not be forced to reserve space for data which did not exist.[19] One could conduct the museum's object administration without forcing the program to sift through files of curatorial data. In this way, this fixed-field relational database achieves some of the benefits of database systems offering variable length and multi-valued fielding, but without sacrificing the advantages of a relational system.

SWAP's architecture and development plan is clear and simple. Onto a highly controlled and authoritative core of object accession numbers and filing classifications are attached modules as needed for new data structures or functions. As long as the core data is complete, pure and accurate, associated data may be entered voluntarily--as required for administrative and descriptive needs. The presence of a complete inventory helps monitor the development of associated records. Subtracting the accession number list of new records from relevant portions of the complete set yields a list of missing items. The user will be comfortable allowing the system to grow only as circumstances demand and resources allow because "tickler" reports can always reveal those items still missing from the database.

Informix as a Relational Database Manager

Queries, Entries and Updates [outline]

Much of SWAP's power must be attributed to Informix's remarkable Perform program, its multi-purpose Query-Update-Entry utility. Perform allows the user to create complexly programmed forms that can control data entry and queries in many ways, including testing for values, and acting on values entered. These are not R:base QBE, or "query-by-example" forms, where queries are executed outside of the form and displayed in it. Rather, Perform forms may be used for data-entry, updating, record deletion, queries, and more. All functions are executed from within the form, in mixed succession if need be.[20] Using this facility, it is possible to implement limited Boolean searches (and and not), range searches, last- and first-entry searches, double-ended wild-card searches, or truncated indexed searches. Queries may operate in several fields at a time. The field spaces used for data entry are the same as those used for queries. Further, each Informix form can relate fields from as many as eight files simultaneously, can implement complex "lookups", show programmed "display-only" fields and manipulate data entry in standard and unusual fashions.

The routine of adding and querying within forms becomes especially valuable when adding data, and is uniquely well suited to the needs of collection management and cataloguing. For instance, one may add an object record, then turn to the artist file to query for the object's artist. If present, the artist record can be joined to the object on the fly, if not, a new artist record may be added, and then joined to the new object, its record still current.

Correcting newly entered or queried data is especially convenient. When errors are detected, the user may page back to the offending records, correct them and return to his work, all this, without changing forms and without executing another query.

Perform's most powerful attribute, and the one which makes the program so useful for scholarly use, is certainly its ability to execute "master/detail" queries--Informix's term for managing "one-to-many" and "many-to-many" relationships. With this feature, without changing forms, it is possible to query in one file, and from the resulting query-list choose one record on which automatically to query all linked records in another file. The query on the join is usually executed with a single keystroke, sometimes two. This powerful utility makes many otherwise routine uses of a command query language unnecessary, and is ideal for working with records composed of many fields and databases with many files. Joins are planned in advance at the database dictionary level, but are not cited there; they are defined and built into each Perform query form. After viewing the records in the linked file, the user may return to the primary list from which he started--still current--choose another record, and request another "detail," or he can execute another linked query into a third file from his current point, thus chaining linkages, each time choosing the direction of query. Operating in a simple object catalogue, this feature allows the user to query for an artist, inspect object records attached to him, choose one of these works and back-query to find all attributions made to it.[21]

Help [outline]

The standard Informix customized help system is rather spare, but Perform works well with MS-DOS macro-processor environments, like Superkey, to make up for this lack. Coded fields, often so unwieldy, but so handy when controlling a limited vocabulary set, become nearly friendly when bundled with Superkey display screens. Keystrokes may be passed through these screens into the current program to set a code. Perform also provides for "display-only fields." The data in these fields are in the form, not in the database, but appear when prompted by the appropriate screen programming. Thus, if coded fields are used, the form may be set to show their significance (Fig. 9, bottom, and Fig.12).

Perform offers useful system help with the F1 function key. It permits user defined messages on the field level and supports programmable "error" messages. Most of its own error messages are quite clear, using the application names for fields and files when necessary.[22] In addition, full text screens may always be written into the form itself to provide information specific to the current application.

Development [outine]

Informix programs are easy to develop, but the DBMS does not offer the user a friendly environment such as the one provided for R:base. Informix does not supply its own application generator and screen painting utilities. Development takes place in the command-driven DOS workspace. However, this allows the database developer great freedom to use familiar tools. All Informix code may be written in ASCII on your own wordprocessor. (I use XyWrite and Nota Bene, whose editors also serve as handy shells and fine scholarly wordprocessors.)

Error handling is usually very good. When code fails to compile, Informix creates error files. These are duplicate versions of the code-file into which the compiler has inserted notes marking and annotating errors, even giving advice about better programming practice.

Obviously, Informix merges easily with the DOS environment. Data entry screens and reports written with Informix's famous Ace report language can all be called as parameters to executable programs. Thus standard DOS menu systems may be used to create user interfaces. Informix includes its own menu utility which may be used to call programs, run batch files, and provide necessary parameters for both.

Informix 3.3 does have several undesirable features: Perform will not sort the results of a query; characters are restricted to the seven-bit standard ASCII set, hence screen painting is unattractive and international characters cannot be used; and queries are always case sensitive. Some of these faults have been addressed in the latest SQL version of the program. The new version supports diacritics and use of DOS box drawing characters (with the separate 4GL programming language) and permits 16 files to be open at once, instead of just eight. All queries and sorting are still case sensitive, however, sometimes forcing the user to define fields that automatically turn all entries and queries into upper or lower case. But its biggest disadvantage for collection management is one which it shares with most database management systems designed for the PC environment: fields are single valued and of fixed length, and must be coaxed into accepting certain forms of data.[23]

SWAP Attribution System

Principles [outline]

Discussion of SWAP's object attribution system has been left to the end. Thus far SWAP has been presented as a utility for the collection-manager. Its underpinning is its authoritative inventory and tools with which to build collection management functions. How does this system support a credible working environment for the curator and researching scholar without compromising its administrative purpose?

To understand SWAP's object attribution system it is necessary to follow its genesis out of the Getty Prototype database. This experimental database addressed many complex questions regarding the merging of institutional data, but it was not a registrar's tool, and was never intended to be one.[24] Rather, emphasis was placed on object description and attribution--curatorial tools.

To this end the Prototype database used a conventional three file system to define "many-to-many" relationships among artists and objects (Fig.10). With this mechanism any single object could be connected to any number of artists, and any single artist could be connected to any number of objects. A linker file made the "many-to-many" connection possible.[25]

SWAP used this system, too, but refined it, by making the link file map the route connecting object and maker. For example, SWAP's linking file might specify the maker's role in the creation of the work. The benefit of this procedure is best seen in the way SWAP handled the problem of works attributed to anonymous artists working under the influence of a known person, for instance, as when a work is attributed to "The School of Peter Paul Rubens."

In these cases the Prototype database defined a generic artist entry for each designation—"Rubens, Peter Paul, Follower of" being one (Fig. 11). Occasionally, such "Follower of Rubens" attributions would be consolidated into a single artist record, although, quite clearly, different personalities may be assumed to be implied by the term. Similarly, "School of Rubens", "Manner of Rubens," and other attribution subtleties create distinct artist records, each with their own family of objects attached.

This procedure is a carry-over from old methods. When these phrases are written on accession cards, or appear in flat-file databases, their meanings are clear. In a relational system, where a single artist record (axiomatically a unique biographical entity) may be attached to a number of objects, their significance becomes clouded and confusing, obscuring the path to objects and falsifying the relation of objects to one another.[26]

In contrast, SWAP's attribution procedure is modeled on a simple grammatical principle. The artist file and the object file contain distinct elements; the links define the relationships between them. "Artist-Relation-Object" corresponds to "Subject-Verb-Object." Nominative entities are not inflected by the objects to which they relate.[27]

SWAP's method isolates all significant conditioning terms and puts them in the linkage file, not the artist file (Fig. 12).[28] Each qualifying anonymous object is ultimately tied to the record of the artist named. Thus a query for Rubens, for instance, will direct one immediately to all the attributions to him and his circle. The initial search is not conditioned by whatever "uncontrolled" vocabulary the curator and historians have chosen to qualify the attribution. When one realizes that the linkage file can be used to designate unofficial and official attributions, the source of attributions, and even to record information regarding the function of multiple makers, details of patronage, or the pictorial sources for objects copied after other works, the system, for all its value as an inventory and administrative tool, also begins to serve fundamental research needs of the museum's academic staff and its scholarly public.[29]

Recognizing that, like artist, school is an assigned attribute, not a "property" of objects, and consequently subject to different naming practices, different specifications, and different opinions, SWAP turned this entity into a file unto itself. The linking file was therefore given the dual function of describing just how each entity (school and/or artist) was associated to the object.

Because attribution opinions and similar assignments of objects to persons and schools are inserted into the attribution link file with real art-historical terminology as found in the paper records, the artist and object files are simplified. The object file contains no artist information; the artist file contains no object information.[30]

Artist Pseudonyms [outline]

Artist name authorities have always created problems in simple databases. One museum's practice of naming an artist might conflict with recommended usage; names of many artists do not fit into western naming traditions; names by which artists are commonly known do not correspond to the "first name" "last name" structure of name fields. SWAP sidestepped this entire issue by providing a means of entering virtually any name into the artist record and providing a way for nearly any query to meet success.

This was achieved partly by using Informix's ability to index composite fields, and partly by redefining the nature of the fields into which the artist name was entered. Rather than calling the two fields for artist name first and last, the fields were named sort name and other name. The primary artist record, record `A,' would use these fields to render the artist name just as the museum wished it to come out. It did not matter whether a user could use this version to achieve a successful query or not, the only principle followed was one which produced proper form in reports. Since SWAP was constructed to allow any artist to be known by up to twenty six pseudonyms, nearly any useful combination of names could be entered in addition to the one required. The museum may prefer "Barbieri, Francesco", but the database may also contain simply "Guercino." For the sake of users, Rembrandt's given name could be placed in the sort name field, without abandoning the museum preference for "Hermansz. van Rijn, Rembrandt."[31]

Behind the scenes, the artist file and the attribution link file are joined by an arbitrary artist number assigned sequentially. Artist pseudonyms share a common artist number but are each assigned a unique name variation code. The composite of artist number and name variation code is indexed to prohibit repetition. This procedure prevents repetitions of number/variation-code combinations, but permits repetitions of artist numbers. The program is instructed to expect more than one artist record per artist number, just as it expects to find more than one attribution link file record per artist.

Similar methods were used to prevent duplicate links between artists to objects. A later version of SWAP allowed automatic sequential assignment of new artist numbers even though these values were allowed to repeat.[32]

Controls on file integrity [outline]

As systems created from the bottom up grow in complexity and add more functions and more files, there is justified concern that data can wander away from its associated records. SWAP defined a protocol that governs the maintenance of linkages and maintains the authority of the data (Fig.13). For the files cited thus far, here are some of the rules.

For the Inventory File: No object record could be entered into the inventory file unless its accession number was unique. No accession number could be entered unless it was assigned an authorized set of classifications corresponding to extant drawer divisions.

For the Object File: No object record could be entered into the object file unless its accession number was unique and already existed in the inventory file. Only one object record could exist per inventory record. The inventory record could not be deleted or changed when an object record was attached to it.

For the Attribution System: When an object file record was linked to an artist file record, the artist record could not be deleted without unhooking all link file attachments to the artist. Similarly, objects could not be deleted when they were attached to artists or schools. Links joining them must be destroyed first.

These, and other programmed controls, help make the database difficult to corrupt with sloppy maintenance. Data is secure by virtue of the rules required to create and remove entries. This means that the system may be administered by several people in succession, with full confidence that the private working habits of a former operator will not have seriously compromised the ability of the database to perform for its current workers.

Most importantly, here is a database that began as a simple inventory project. In a short time, with minimal financial commitment, it evolved into a system of moderate sophistication with potential for further growth. SWAP is the kind of system that can be put together by someone who has not been trained in database techniques or theory. It does not offer the bells and whistles found in some of the commercial systems, but it goes further than some in respecting the differing needs of all those who must use or manage object data within the small museum environment.

Figures [outline]

1.   Database Architecture showing Object file partitions.
2.   Accession Number Inventory Report.
3.   Data Entry Update Query Screen for Inventory file.
4.   Curator's card drawer list.
5.   Tally list by collection classification.
6.   Master classification list.
7.   Inventory and Catalogue Module Data Structure.
    File and Classification "Lookup".
    Inventory form and Core Catalogue Form.
8.   Bernoulli strategy schematic.
9.   SHORTCAT SCREEN. (Object Query Update Entry Screen.)
    Top: Object file fields Noted.
    Bottom: Link file fields and Link "Display-Only" fields.
10.   Diagram of a "many-to-many" relation.
11.   Attribution System. Getty Prototype vs. SWAP.
12.   Attribution System. Link file conditions.
13.   Catalogue schematic. Master/Detail relations and "lookups".

Notes [outline]

1. A shorter version of this paper was read at the 1988 conference of the Museum Computer Network at Santa Monica, California. [text]
2. Within the last few years museums have seen the advent of ever more capable automated management and cataloguing systems. Among other features, these programs allow the user to execute sophisticated searches through structured lexicons and thesauri, and submit routine administrative procedures to computer control. Although some of these systems are wonderfully sophisticated, they may be costly. [text]
3. Prices are quoted at approximate mail-order discount: XT clone MS-DOS computer with 20 or 30 megabyte hard-disk: $1500, Bernoulli Box: $1700, 10 Bernoulli disks: $800, DBMS and related software $1500, Printer: $350. Miscellaneous supplies, about $350. This assortment would form the bottom-line minimal configuration necessary to yield acceptable results for collections of up to forty or fifty thousand objects. Newer MS-DOS computers based on the '286 or '386 Intel chip will work faster. The Bernoulli Box is not the fastest storage device, but its speed can be increased for AT class and faster computers by decreasing the interleave value. [text]
4. Many readers may find espousal of this dual ambition surprising, for database managers commonly have been forced to adopt schematized or abbreviated notations for objects. It was thought that the hard facts required for clean reports and efficient object management, could not co-exist with the soft facts--the opinions and value judgments--the critical object history prized by curators.
  The following hypothetical situation illustrates how real-world records might describe an object:

Over the years, historians and our curators have attributed this painting to three artists, Tom, Dick, and Harry. The present curator, however, believes this work to be from Harry's workshop with traces of studio hands Larry and Carrie. However, under the terms by which the work was given to the museum, it must be identified in the museum and museum publications as by Dick, whose lost composition our painting probably imitates. Incidentally, Harry, is commonly known by several pseudonyms, but the museum insists on using his given name, by which only few know him.

  Not many modest collection databases can accommodate this kind of data in their artist/object attribution apparatus. [text]
5. A potentially intricate and frustrating endeavor when diverse collections must be accommodated. [text]
6. In the MS-DOS world the resulting database system often is of the fixed-field variety, the file structure is flat--one file, or related--essentially joined flat files. Advanced home-grown arrangements might include separate "maker" and other files or tables to reduce the occurrence of repeating data. Transaction records are kept in similar linked files. Some systems might apply this concept to other situations where data tends to duplicate, including, perhaps, cultural or stylistic groupings, donors, bibliography, exhibition history--all of which must be linked to object descriptions.

Some of the more sophisticated formats of this genre use a three file system to establish "many-to-many" relations. With these, one object can be linked to multiple makers, and one maker linked to multiple objects. One borrower record can be linked to all items borrowed, and one item can be linked to all those who have borrowed it. But servicing these systems and getting them running often require use of intermediate data entry forms, or other slow and intricate methods of data entry. The consequential systematic transfer of complete object records are often unproductive and unnecessary; the process ignores the fact that only a small portion of a museum's accumulated data will be useful for current operations.

Although remarkable techniques for rapid data entry have been developed, and clever design of data entry screens will go a long way to make life pleasant, the end product may still be very wasteful of computer resources, yielding slow stodgy systems, packed with empty fields and repeating data. [
text]
7. The museum for which the program was built wishes to remain anonymous. SWAP was a spare-time enterprise undertaken midst this writer's other duties as the museum's Prototype Project administrator. [text]
8. The Museum Prototype Project, sponsored by The Getty Art History Information Program (AHIP), studied the feasibility of creating a unified collection catalogue for paintings residing in eight trial museums. When project funding ceased, SWAP's development halted. The program was abandoned by the museum, functional but incomplete. PCPHASE, the database provided to the museum by the Prototype Project, was abandoned too. [text]
9. Informix, Version 3.3 subsequently has been replaced by a more advanced and versatile release. Some of its features are discussed on page. [text]
10. Records in the accession ledgers were fixed, but the order of cards in the drawers were movable, and tended to obey the logic of whomever cared for the collection. [text]
11. Additionally, as each object was packed for transfer, its accession number and packing-box number were entered into a another, coördinated database. The goal was to compare the list of objects to the list of cards--in order to discover those cards for which no objects were found, and those objects having no cards. This packing database was to form the basis of a true location list as unpacking commenced, but this step was probably never realized. [text]
12. For example, if an artist field were added, each record would be given field spaces for an artist name, or spaces in which to indicate school, or media, or donor. Some flat-file and relational databases designate fields for several artists, even adding areas for printers, publishers, manufacturers, and so on--wasting more data storage resources and introducing unnecessary complications for queries and reports. [text]
13. The object file might better be considered a "Working Object File," at least provisionally. [text]
14. Whereas some objects, such as the sketchbook cited above, required expansion, others needed conflation. Although the museum would try to assign consecutive accession numbers to objects which belong in a group, as in a portfolio of prints, there was no way to identify groups. Sometimes such lots would be given to the museum in stages, so their numbering could not be consecutive. For these situations, another field, not part of the accession number composite, served to unite disparate objects. Arbitrary numbers entered into a lot field, and described in a specially reserved portion of the title field would allow users to collect each such grouping with a single query. This field was a data element in the catalogue file, not the inventory file. [text]
15. The Bernoulli system makes an excellent backup device. It takes less than five minutes to produce an exact duplicate of a 20 megabyte disk. Copies made this way need not be "restored" for use. When the Bernoulli Box is used for the database, it need not interfere with other uses since the entire database or any other application can be removed easily. This means that if the collection manager already has an MS-DOS computer and printer, the only equipment he need purchase is the Bernoulli device.

The printer does not have to be fast, but it must be reliable. I chose the Epson FX-85 because of its proven worth and low cost, and because it was possible to make it print the entire IBM character set, including characters with diacritical marks. [
text]
16. The system interface prompted for insertion of proper disks, loaded selected Entry-Update-Query forms, and custom help files, and ran reports and utilities. Building the interface required no specialized programming ability. It was created out of DOS batch files and Superkey macros, all called through the Informix menu utility. When a query or report form was called, a batch file would check for the correct disposition of disks, prompt for a missing disk, load the proper Superkey help screen set, and finally load the form or run the report. [text]
17. Specialists in rapid data entry will be quick to tell you that this type of crowded form is not efficiently designed for their purpose. They are right, of course; but placing all fields on a single screen does make it very easy to use Informix's form-Query-Update-Entry utility. The compactness disappears with familiarity. Rapid data entry screens should have simple layout, address a limited number of fields, and minimize cross file relations, but they are not efficient tools for general object administration and collection queries. [text]
18. These values were not data elements. They were programmed into the form and did not take up precious data storage resources. [text]
19. Not every object contains an inscription. No database planner would want to waste valuable space in every object record for inscription text, notes about its meaning, language, location, author, calligraphy, style, media, etc. An inscription file can link its records to the small number of inscribed objects. Another benefit: no compromise need be made when one object has several inscriptions; just link several inscription records to one object. [text]
20. Any number of forms may be created for the same materials. One form may bar designated activities in specified files and fields, while another permits it. One form may be designed for rapid data entry and another for general system use and occasional entry. [text]
21. A typical query session might proceed as follows: The goal is to find the complete set of attributions for an object we know only as the work of one particular artist. We do not know if other attributions to this object exist in the database. We do not know the object's accession number, and may not even remember its title. The strategy is to query the artist file under the name we know, and explore the links to objects, to chose the object required, and then find all its links to artists and schools.
(1)   Query artist file for artist. Artist record is current. One link and its one object record also show.
(2)   Ask for detail of link file from artist file. Link file is now active, all links to artist are current.
(3)   Page through link file records to find target object. (Object records joined to link records show as each new link record is displayed).

The above procedure is standard for finding all objects attached to an artist. Now we continue, searching for all links to a particular object.

(4)

 

Ask for detail of object file from appropriate link record. One target object record is now current.

(5)

 

Ask for detail of link file from object record to see all artist and school records attached to object. Page through links to see all connected schools and artists. Any current query list can be output to system files or printer. Its form copies the screen format.

  However, if we knew the accession number of the object, we could have queried for it in the object file and moved directly to step five. These instructions are more complex in the telling than in practice. The sequence outlined above requires input of only the artist's name, or a truncated part of it. The rest of the query procedure is accomplished with simple system controls, no data are input, no linking fields are filled for queries. [text]
22. In contrast, Informix's system error messages are opaque. [text]
23. Informix for MS-DOS environments is a single user system with no concurrency, password, or security controls. Dedicating specific disks to sensitive information, including, but not limited to Donor and Vendor files, Valuation files, etc. overcomes this limitation somewhat--by default creating a poor-man's security system. One of the virtues of the removable Bernoulli cartridge is the de facto database security it provides. If physical disks are under lock and key, they cannot be accessed. If sections of the database are to be made available at other locations on a read-only basis, duplicates of the required files may be run on independent machines. Unauthorized access cannot occur if data is not present. Unauthorized writes on satellite machines never altar the main data files. Transferring whole files to satellite locations is much easier than down- and up-loading files or tables into alternate databases. Single-user MS-DOS Informix applications may be upgraded to a multi-user UNIX system, with security password controls, if necessary. [text]
24. Utilities were not provided with which to register object location, object valuation, loan activity, etc. [text]
25. This is a standard database management tool, such as is commonly used to associate students to courses and courses to students; clients to services, services to clients, etc.

The Prototype object record included some fields that might better have been conceived as relations. One of these was donor, another was school. The latter could not be repeated within the record for a single object. When these fields are defined as part of the object record, they must be understood as "properties." Properties such as size, offer little opportunity for subjective conditioning and uncontrolled multiple nomenclatures, though they may be multi-valued. Multi-valuedness, in itself, does not constitute a relationship. Lists of media, where nomenclature problems do exist, are best handled by multi-valued and variable length fields tied to a lexicon. Object titles offer additional problems. Although often determined by opinion, tradition, and scholarly interpretation, titles are usually thought to be a property of the work of art, even when one work may be known by many titles. Although overlapping is to be expected, the title field should not be confused with fields created for subject access. SWAP made no attempt to offer opportunity to create multi-valued titles or provide subject access, but there is no reason why a subject file cannot be linked to object records for this purpose. See
Figure 1. [text]
26. In cases where objects are truly anonymous, be they Roman glass vessels or Renaissance paintings, the user is encouraged to define no artist at all. In these situations, all relevant information derives from the object. The database of the Prototype Project, insisting on the metaphor of the maker, defined anonymous artists with two-hundred year active lifespans for objects attributed to the seventeenth or eighteenth centuries, and active lives of a single year for anonymous makers of dated paintings. In contrast, breaking its own rules, SWAP offered a single anonymous artist record to which any such object could be attached, though technically, anonymous works should be left unattached. [text]
27. The following variations on attributions to Rubens and his circle were found in the databases of several of the museums participating in the Prototype Project:

Rubens, Peeter Pauwel
Rubens, Peeter Pauwel (Copy after)
Rubens, Peeter Pauwel (Follower of)
Rubens, Peeter Pauwel (Imitator of)
Rubens, Peter Paul
Rubens, Peter Paul (Studio of)
Rubens, Peter Paul, Attributed to
Rubens, Peter Paul, Copy after (Flemish, probably XVII century)
Rubens, Peter Paul, Copy after (XVII century)
Rubens, Peter Paul, Copy after (probably XVIII century)
Rubens, Peter Paul, Sir
Rubens, Peter Paul, Sir (Studio of)
Rubens, Peter Paul, Workshop of

  It seems obvious from the above list, that anonymous objects provide the source of the artist name. These names do not stand for identifiable, but anonymous personalities, as, for example, the "Master of the Housebook" does. Rather, they are generic attributions, personifications or projections drawn out of the objects, or analogies made to styles of known masters. In these cases, the database is not relational but tautological. [text]
28. SWAP's attribution link file provides four filters. The first identifies the kind of artist record attached: Thus one may specify that the object is a) by the hand of the linked artist, b) anonymous, but attached to the name of the named person, c) anonymous, and linked to a school or cultural tag. The second field designates the manner of relation: "School of," "Tradition of," "After a design by," etc. The third provides opportunity to register an opinion about the stated relation: "Verified," "Rejected," "Traditional," "Dubious," etc. The fourth offers space in which to cite the artist's role and is used to specify printers, publishers, founders, assistants, and so on.

Potentially, each link could be signed: Scholar A says the object is a "copy after;" scholar B says it is "autograph;" scholar C says it is "definitely Jordaens;" but the museum has officially called it "School of Rubens." If desired, additional programming could connect each link to a bibliography or citation file. [
text]
29. In the Getty Prototype system a user could not execute an indexed query on "Rubens" if he wished to obtain a single list of all objects attached to his name. To find all works given to Rubens and his school, etc., the researcher would have to query the artist file for every occurrence of the name Rubens, then issue a new query on each permutation bearing his name to find attached objects. The method is awkward, and quickly brings the user near his frustration threshold. Although the distinctions between "manner of" and "style of," may be meaningful art-historically, and certainly must be respected, the method does not recognize how databases are used to find objects and attributions. Rather than aiding the user, this procedure stymies him and places impediments before him. [text]
30. Following the practice established in the Prototype database, each assignment of artist to object was numbered in the link file. This procedure enables related schools, artists and makers to be sorted in any specified sequence. A similar field in the object record indicated the total number of links attached. The data entry forms were programed to prohibit entry of more attribution links than had been authorized in the object record. If the database grew to such proportions that the scholarly trappings overwhelmed the more limited needs of the registrar, any query made from the object file could be conditioned by limiting the number of artist or school records allowed to appear. [text]
31. Persons do not need given names. Master of Flémalle, Hand G, and Monogrammist CC are all acceptable. [text]
32. Because these values are allowed to repeat, the artist number cannot be a serial field. Although the user is free to assign any number he wishes, any request for a new number automatically offers to increase the last number assigned by the value of one. [text]

Top | Outline | Home | Write to Author