This page last changed on Apr 15, 2007 by amitku.

Overview

The chunk file creation is semi automatic. The file is created in stagesand in each stage the @stage attribute of the root <collection> is incremented and @date-modified attribute updated.

The number of stages depend upon the number of chunk types that need processing for example: work and chapter level chunks would need three stages (Skeleton, processing for work and processing for chapter).

Stages

Skeleton

The first stage involves creating a skeleton using java code that looks in a directory for all the XML files and creates a <head> element with <files> element along with user supplied information about the XML database url and name of the collection. By default the java code also creates the work type divs and maps the /TEI.2/text element as each individual work and /TEI.2/teiHeader/fileDesc/titleStmt/title as the chunk title. An example of this generated skeleton is below; notice the use of xpaths, these are default values for TEI documents but user can change these before going on to the next step

A set of XSL files is used to process the skeleton generated by the Java code. The collection curator modifies the XPATHs and runs XSL transformations on the chunk file. The chunk file needs access to the files on the file system for the transformations to work. Essentially the XSLT scripts are grabbing label information, generating the unique identifier for the chunk type and appending new chunks found in each file.

Expansion

In each stage a chunk type gets expanded and chunk meta data incorporated in the chunk property file; For example in the third stage the work and paragraph chunks have been expanded as shown below for the Makings of American collection.

A sample XSL file that can be used as a template for support other semantic structures extractWorkChapters.xsl (not documented).


step1.jpg (image/jpeg)
skeleton1.jpg (image/jpeg)
extractWorkChapters.xsl (text/xml)
Document generated by Confluence on Apr 19, 2009 15:04