TBRC’s Tibetan eText Repository is for everyone who is reading this. It’s for practitioners, scholars, monks and nuns, researchers, and anyone who is interested in Tibetan Buddhist literature. It contains more than 18,000 volumes and is fully “discoverable.” You can search a place, a name, a topic, a title, a term, across many collections, through multiple traditions, at any point in time. Global search and browsing entry points are the two primary methods employed to discover the TBRC digital library.
The Global Search discovery is based on TBRC’s twofold approach of deep search and deep context. When you search for a word or phrase using the Global Search bar at the top right of the TBRC homepage, the Lucene search engine and eXist database extensions find matches across the entire cataloged library. This is known as deep search match-highlighting. TBRC’s classification system simultaneously contextualizes these matches, identifying the text within the deep context of the library metadata and allowing you to make connections with bibliographic information and scanned sources.
Results of the global search can be filtered by license type, user access type, text-to-scan correspondence, genre (for example, biography, history, liturgy, terma) and subject (such as lineage, terminology, philosophical system, linguistics). Beneath each match, a list of items references the bibliographic context and the textual content immediately preceding and following the highlighted word or phrase.
On the home page, along the top menu bar, you can browse the library by browsing entry point: Works, eTexts, Genres, Subjects, Persons, or Places. For example, you can click Genres, select History, select rgyal rabs (royal chronicles), and research the associated works and lineages using the filters described in the previous paragraph.
Another important thing to know about the TBRC digital library is the difference between OCR eTexts and Input eTexts. Input eTexts are Unicode Tibetan (TEI-XML) files converted from texts input by Tibetan authors, publishers, and monasteries in a variety of formats, including TibetDoc and Sambhota. OCR eTexts are generated using the Namsel OCR (Optical Character Recognition) program in partnership with the University of California, Berkeley. OCR eTexts retain text-scan page correspondence, an additional level of semantic information that is useful in referring back to the original text.
TBRC’s search interface continues to improve, thanks to their joint effort with researchers from SOAS, University of London, focusing on Tibetan “word breaking” and parts-of-speech tagging. Unlike English, Tibetan writes all words together, without marking where a word begins or ends. The more accurately TBRC search software can break Tibetan words apart, the more likely that users will find what they’re looking for. Also, search software that recognizes syntax dramatically improves search results and leads to more sophisticated tools for Tibetan linguistics.