Monk Datastore Overview

prev     tcon     next

Searching for Core Objects

Each core object class provides static methods named find which find all the objects of the class which satisfy a collection of search criteria.

There are many different criteria for the various objects and attributes. All of them extend the base class SearchCriterion. See the javaodoc for a complete list.

Each criterion may specify one or more values, and has a variety of different constructors for your convenience, including single value constructors and multiple value constructors for vararg lists, arrays, and collections of tags and objects. The find methods also have multiple versions, for single criteria, vararg lists of criteria, and arrays and collections of criteria.

When you do a search, the collection of objects returned is the set of objects which satisfy all of the criteria you specify. For multiple values within a single criterion, the conditions are "or"'d together. For multiple criteria, the conditions are "and"'d together.

For example, suppose we do a search for works with three criteria. The first criterion matches either of the lemmas "run (v)" or "walk (v)". The second criterion matches the Nineteenth Century Fiction corpus. The third criterion matches circulation dates after 1850. We search for all the works with satisfy all three criteria. The result is the collection of all the works in NCF circulated after 1850 which use either of the verbs "run" or "walk". A statement to do this search is:


Collection<Work> works =
    Work.find(
        new LemmaCriterion("run (v)", "walk (v)"),
        new CorpusCriterion("ncf"),
        new CirculationYearCriterion(1850, null)
    );

Note the CirculationYearCriterion. In this criterion we specify 1850 for the start of a range and null for the end of a range. The value null specifies no restriction on the end of the range, so this criterion matches any circulation year which is greater than or equal to 1850. There are several such numeric range criteria, all of which work the same way. They can be used to specify a single number, an inclusive range of numbers, a greater than or equal condition, or a less than or equal condition.

Instead of finding works in this statement, let's find authors instead:


Collection<Author> authors =
    Author.find(
        new LemmaCriterion("run (v)", "walk (v)"),
        new CorpusCriterion("ncf"),
        new CirculationYearCriterion(1850, null)
    );

With this change, we are now finding all the authors who wrote works in NCF circulated after 1850 which use either of the verbs "run" or "walk".

Now let's find spellings instead:


Collection<Spelling> spellings =
    Spelling.find(
        new LemmaCriterion("run (v)", "walk (v)"),
        new CorpusCriterion("ncf"),
        new CirculationYearCriterion(1850, null)
    );

With this change, we are now finding all the different spellings of the verbs "run" or "walk" used in NCF works circulated after 1850.

In general, the various search criteria can be used in any combination to search for any of the core objects. The lowest level of the implementation which makes this possible is called "the grand unified searcher." As we'll see later, the same searcher and the same criteria are used to find counters and words as well as core objects.

The find methods return their results in an undefined order. If you want results to be ordered, in the natural ordering for the result type or otherwise, you must call a sort method to sort them.

The examples below illustrate several different coding styles using these constructors and methods.

Example 1. Find all the authors in a corpus born after some date.


/** Finds all authors in a corpus born after a given date.
 *
 *  @param  corpus      Corpus.
 *
 *  @param  year        Year.
 *
 *  @return             Authors in corpus born after year.
 *
 *  @throws ModelException
 */
 
Collection<Author> find (Corpus corpus, int year) 
    throws ModelException
{
    SearchCriterion corpusCriterion = new CorpusCriterion(corpus);
    SearchCriterion birthCriterion = new AuthorBirthYearCriterion(year, null);
    return Author.find(corpusCriterion, birthCriterion);
}

Example 2. Print all the lemmas used in a collection of work parts with sorting.


/** Prints all the lemmas used in a collection of work parts.
 *
 *  <p>The lemmas are sorted and printed first in case and diacritical-insensitive
 *  increasing alphabetical order by word class tag, then in case and
 *  diacritical-insensitive increasing alphabetical order by lemma tag.
 *
 *  @param  tags     Work part tags.
 *
 *  @throws ModelException
 */
 
void printLemmasForWorkParts (String[] tags)
    throws ModelException
{
    SearchCriterion criterion = new WorkPartCriterion(tags);
    Collection<Lemma> lemmas = Lemma.find(criterion);
    Lemma[] sortedLemmas = Lemma.sort(lemmas, Lemma.SortOption.WORD_CLASS_ASCENDING,
        Lemma.SortOption.TAG_ASCENDING);
    for (Lemma lemma : sortedLemmas) System.out.println(lemma.getTag());
}

Example 3. Print all the spellings of the headwords "love" or "hate" for an author.


/** Prints all the spellings of the headwords "love" or "hate" for an author.
 *
 *  @param  author  Author.
 *
 *  @throws ModelException
 */
 
void printSpellings (Author author)
    throws ModelException
{
    List<SearchCriterion> criteria = new ArrayList<SearchCriterion>();
    criteria.add(new AuthorCriterion(author));
    criteria.add(new LemmaPatternCriterion("love (*) | hate (*)"));
    for (Spelling spelling : Spelling.find(criteria))
        System.out.println(spelling.getTag());
}

Note the use of the LemmaPatternCriterion search criterion. The pattern uses the metacharacters "*" and "|" to represent "any sequence of zero or more characters" and "or" respectively. You can also use the metacharacter "." to match any single character. There are several pattern criteria.

We also could have used a LemmaHeadWordPatternCriterion:


    criteria.add(new LemmaHeadWordPatternCriterion("love | hate"));

Example 4. Print all the works published in a given date range which use Italian words.


/** Prints all the works published in a given date range which use Italian words.
 *
 *  @param  start       Start year.
 *
 *  @param  end         End year.
 *
 *  @throws ModelException
 */
 
void printItalian (int start, int end)
    throws ModelException
{
    for (Work work : Work.find(
        new CirculationYearCriterion(start, end), 
        new PosCriterion("fw-it")))
            System.out.println(work.getTitle());
}

prev     tcon     next