public class TokenizingLanguage extends ChunkedLanguageParser
LanguageParser that uses Tokenizers to
split text into sentences and tokens. The tokenizers used should use
OffsetLocation for its tokens so that the language can correctly map
the locations of the tokens to the actual location in the TextSource.encounter| Modifier | Constructor and Description |
|---|---|
protected |
TokenizingLanguage(Locale locale,
TokenizerFactory paragraphTokenizers,
TokenizerFactory sentenceTokenizers,
LanguageEncounter encounter) |
| Modifier and Type | Method and Description |
|---|---|
static LanguageFactory |
create(Locale locale,
TokenizerFactory paragraphTokenizers,
TokenizerFactory sentenceTokenizers)
Create a
Function that can create a parser for the given locale. |
static LanguageParser |
create(Locale locale,
TokenizerFactory paragraphTokenizers,
TokenizerFactory sentenceTokenizers,
LanguageEncounter encounter)
Create an instance for the given locale and tokenizers.
|
protected void |
handleChunk(CharSequence sequence)
Handle the given sequence of characters.
|
Locale |
locale()
Get the locale of this parser.
|
emitToken, emitToken, emitToken, endSentence, flush, startSentence, text, textprotected TokenizingLanguage(Locale locale, TokenizerFactory paragraphTokenizers, TokenizerFactory sentenceTokenizers, LanguageEncounter encounter)
public Locale locale()
LanguageParserprotected void handleChunk(CharSequence sequence)
ChunkedLanguageParserChunkedLanguageParser.startSentence(int), #emitToken(int, se.l4.lect.Token.TokenType, String) and
ChunkedLanguageParser.endSentence(int).handleChunk in class ChunkedLanguageParserpublic static LanguageParser create(Locale locale, TokenizerFactory paragraphTokenizers, TokenizerFactory sentenceTokenizers, LanguageEncounter encounter)
locale - the Locale that the parser is forparagraphTokenizers - the tokenizer used to tokenize a paragraph into sentences. The
tokenizer must use OffsetLocation for its tokens.sentenceTokenizers - the tokenizer used to tokenize a sentence into individual tokens. The
tokenizer must use OffsetLocation for its tokens.encounter - the encounter that should receive resultsTokenizingLanguagepublic static LanguageFactory create(Locale locale, TokenizerFactory paragraphTokenizers, TokenizerFactory sentenceTokenizers)
Function that can create a parser for the given locale.locale - the Locale that the parser is forparagraphTokenizers - the tokenizer used to tokenize a paragraph into sentences. The
tokenizer must use OffsetLocation for its tokens.sentenceTokenizers - the tokenizer used to tokenize a sentence into individual tokens. The
tokenizer must use OffsetLocation for its tokens.TokenizingLanguage for a given LanguageEncounterCopyright © 2018. All rights reserved.