public interface TokenPattern
regular expression pattern does, but match on
tokens and their properties.
Patterns are compiled via compile(String) and follow a simple
format. A token is matched using it's type or the special type any
and a sequence of tokens are separated by a space. For example
word symbol would match word followed by a
symbol.
Tokens can also match on its text or properties:
// Match any token
TokenPattern.compile("any");
// Match a word
TokenPattern.compile("word");
// Match against token.getText()
TokenPattern.compile("word='Test'");
// Shortcut to match the text of any type of token
TokenPattern.compile("'Test'");
// Match against TokenProperty.NORMALIZED
TokenPattern.compile("word,normalized='test'");
// Match word followed by symbol
TokenPattern.compile("word symbol")
// Match against regular expression
TokenPattern.compile("word=/test/i");
// Shortcut to match via regex for any type of token
TokenPattern.compile("/test/i");
By default whitespace tokens are ignored, to enable whitespace matching use
WITH_WHITESPACE:
TokenPattern.compile("word whitespace symbol", TokenPattern.WITH_WHITESPACE);
Tokens can be set to optional or to not match:
// Optional symbol followed by a word
TokenPattern.compile("symbol? word");
// Words that are not preceded by a symbol
TokenPattern.compile("!symbol word");
// Match the token at least once
TokenPattern.compile("symbol='$'+ word");
// Match the token zero or more times
TokenPattern.compile("symbol* word");
// Match the token twice
TokenPattern.compile("symbol{2} word");
// Match the token between one and five times
TokenPattern.compile("symbol,normalized='#'{1,5} word");
// Match the token zero to five times
TokenPattern.compile("symbol{,5} word");
// Match the token two or more times
TokenPattern.compile("symbol{2,} word");
Groups are also supported:
// Use parenthesis to create an optional group of Mrs + period
TokenPattern.compile("(word,normalized='mrs' symbol,text='.',continuation)? word");
// Use brackets to create an OR between tokens or groups
TokenPattern.compile("[word,normalized='mrs' word,normalized='mr'] symbol,text='.',continuation?");
| Modifier and Type | Field and Description |
|---|---|
static int |
WITH_WHITESPACE |
| Modifier and Type | Method and Description |
|---|---|
static TokenPattern |
compile(String pattern)
Compile a pattern that can be used to match tokens.
|
static TokenPattern |
compile(String pattern,
int flags)
Compile a pattern that can be used to match tokens.
|
TokenMatcher |
matcher()
Create a new matcher for streaming matching.
|
static final int WITH_WHITESPACE
TokenMatcher matcher()
static TokenPattern compile(String pattern)
pattern - static TokenPattern compile(String pattern, int flags)
pattern - flags - Copyright © 2018. All rights reserved.