|
This page last changed on Apr 15, 2007 by amitku.
What is Nora chunk and Nora DB ?
Nora chunk is an abstraction layer at the level of collection wrapped with a JAVA API that facilitates following operations on the collection1 of text.
- Retrieval of the Metadata about the collection. -Metadata could point to EAD/METS.
- Retrieval of the collections structure that defines the hierarchy in semantics of work,chapter, volume, line groups. These semantic units are defined at the level of a collection and can be unique for each collection.
Nora DB is an abstraction built on top of Nora Chunk/eXist and Lucene that allows users to
- Query about collection entities.
-Analytical queries (frequency and POS).
-XPATH queries.
-Full Text queries including any and all of
- Wildcard Searches
- Fuzzy Searches
- Proximity Searches
- Range Searches
- Grouping
- Boolean Operators
See here http://lucene.apache.org/java/docs/queryparsersyntax.html
Entities of Nora Chunk and Nora DB
- Collection -Collection Structure
- Chunk -render the Chunk as XML/HTML and with a feature selected (keyword in context feature)
- Features word, bi and tri grams (number and position)
- POS for the words and POS phrases for example "lovely, smart and beautiful" (adjective adjective conjunction adjective).
- Position attribute of the feature (where they start and end in the text)
- Number of occurrences of the features in a chunk and collection.
- Number of occurrences of the feature with a POS role for example love as verb in a chunk and collection.
- Position of the feature with a POS role (where they start and end in the text)
How is chunk information modeled?
Chunk information is modeled in an nora chunk properties file, this is an object model of the nora-chunk api along with work flow information on how to create such a file.
1 Collection here means a collection of text that share some common property. The collections them
selves could have sub collections or partitions
|