A comprehensive low level core Scanner suitable for any text parsing and/or Lexer construction.
The Text Scanner is used to extract Tokens from text, check for strings or characters (delimiters), skip over text etc.
Incorporates the following basic operations:
- Maintains an Index (a scan pointer) for all scanning operations.
- Check for characters or string at the current Index (or Index + offset).
- Scan up to characters or strings.
- Scanned text is recored in Token for later access.
- Several Skipping operations.
- Some applicable operations record a delimiter in Delim
- Error logging via ScanErrorLog (which also records the position of the error for later reporting)
In the documentation below the following abbreviations are used:
- Eos: End of Source (or string)
- Eol: End of Line
Members | Description |
Constructor: | |
C: TextScanner(string source, ScanErrorLog errorLog = null) |
Create a TextScanner with given source string and 'internal' (errorLog == null) or 'external' ScanErrorLog. |
Implementation: | |
P: char Delim |
Get last delimiter logged (where applicable). |
P: string Match |
Get the matching string for the last IsAnyString or SkipToAnyStr method call. |
P: ScanErrorLog ErrorLog |
Get/Set the bound ScanErroLog. |
Token operations: | Notes: Several scanning operations record the scanned text in Token. The following services are used to operate on this token. |
P: bool IsToken |
Check if a Token currently exists. |
M: void SetTokenRange(int startIndex, int endIndex) |
Manually set the Token start and end index, which will be used to retrieve the Token on the next call: - The scanner automatically maintains these indexes for any operation that records a token. - This should only be used in special cases (say for extensions). The values are set to 0 (empty Token) if out of range. Parameters: startIndex: The zero-based starting position, or less-than zero for the current index position.endIndex: The zero-based ending position. Adjusts to Eos if less-than zero or out of range. |
P: string Token |
Get the current Token else string.Empty for none. |
P: string TrimToken |
Get current token Trimmed. |
M: bool ValidToken() |
Check if the current Token is not null or WhiteSpace. |
Source Management: | |
M: void Insert(string text) |
Insert text at the current Index, and continue scanning from there. |
M: void InsertLine(string text) |
Insert text and newline (\r\n or \n) at the current Index, and continue scanning from there. |
M: void Remove(int startIndex) |
Remove a section of the Source string, from startIndex up to, but excluding, current Index. |
M: void SetSource(string source) |
Set the Scanner Source from a String and reset Index to start. |
M: string SubSource(int startIndex, int length = -1) |
Retrieve a substring of the scanner Source: - Mainly used for debugging and tracing. Parameters: startIndex: The zero-based starting position, or less-than zero for the current index position.length: The number of characters to retrieve, Adjusts to Eos if less-than zero or out of range.Returns: A string from startIndex of length length: - Or empty string if startIndex is greater-than source length or length is zero. |
Index Management: | |
P: int Index |
Get: current scan index. Set: scan index (0 = start, < 0 or > length = end, else intermediate). |
Core Utilities: | |
M: int CountCh(char c) |
Get count of consecutive matching characters and advances Index. |
M: bool IsAnyCh(string chars) |
Check if character at Index is one of the chars. Returns: True: if found, advances the Index and logs the char in Delim. False: if not found and Index is unchanged. |
M: bool IsAnyString(IEnumerable<string> matchStrings, bool advanceIndex = true, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) |
Check if text at Index equals any string in matchStrings and optionally advance the Index if it matches. - Match contains the matching string. Parameters: matchStrings: Enumerable set of strings to match.advanceIndex: Advance Index to just after match (default) else not.comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase). |
M: bool IsAnyString(string matchStrings, bool advanceIndex = true, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) |
Check if text at Index equals any string in delimited matchStrings and optionally advance the Index if it matches. - Match contains the matching string. Parameters: matchStrings: Delimited strings and first character must be the delimiter (e.g. "|s1|s2|...")advanceIndex: Advance Index to just after match (default) else notcomp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase) |
M: bool IsCh(char c) |
Check if the character at Index matches c and advance Index if true. |
P: bool IsEol |
Query if Index is at End of Line. |
P: bool IsEos |
Check if Index is at End of Source. |
P: bool IsEosOrEol |
Query if Index is at Eos or Eol. |
M: bool IsPeekAnyCh(string chars, int offset = 0) |
Check if character at relative offset to Index matches any one of the chars (index unchanged). |
M: bool IsPeekCh(char c, int offset = 0) |
Check if character at relative offset to Index matches c (index unchanged). |
M: bool IsString(string matchString, bool advanceIndex = true, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) |
Check if text at Index equals matchString and optionally advance the Index if it matches. Parameters: matchString: String to match.advanceIndex: Advance Index to just after match (default) or not.comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase). |
M: char NextCh() |
Get character at Index and increments Index, or '0' for Eos. |
M: char PeekCh(int offset = 0) |
Get character at relative offset to Index (index unchanged). Returns: Character or Eos ('0') if out of range. |
M: void ToEos() |
Advance Index to Eos. |
Skipping Operations: | |
M: bool Skip(char skipChar) |
Skip while character is skipChar. Returns: True if not Eos after skipping else false. |
M: bool SkipAny(string skipChars) |
Skip while character is any of the skipChars. Returns: True if not Eos after skipping else false. |
M: bool SkipTo(char termChar, bool skipOver = false) |
Skip until the termChar is found: - Optionally skip over the delimiter if skipOver is true. Returns: True: Found and Index at matching char or next if skipOver = true. False: Not found or Eos and Index unchanged. |
M: bool SkipToAny(string termChars, bool skipOver = false) |
Skip until any one of the termChars is found. - Delim contains the matching character. - Optionally skip over the delimiter if skipOver is true. Returns: True: Found and Index at matching char or next if skipOver = true. False: Not found or Eos and Index unchanged. |
M: bool SkipToStr(string text, bool skipOver = false, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) |
Skip up to given text and optionally skip over it if skipOver is true. Parameters: comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).Returns: True: Found and Index at start of matching text or just after if skipOver = true. False: Not found or Eos and Index unchanged. |
M: bool SkipToAnyStr(IEnumerable<string> matchStrings, bool skipOver = false, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) |
Skip up to first occurrence of any string in matchStrings and optionally skip over the matching string. - Match contains the matching string. Parameters: matchStrings: Enumerable set of strings.skipOver: Advance Index to just after match (default = false) else notcomp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase)Returns: True: Found and Index at start of matching text or just after if skipOver = true. False: Not found or Eos and Index unchanged. |
M: bool SkipToAnyStr(string matchStrings, bool skipOver = false, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) |
Skip up to first occurrence of any string in delimited matchStrings and optionally skip over the matching string. - Match contains the matching string. Parameters: matchStrings: Delimited string and first character must be the delimiter (e.g. "|s1|s2|...").skipOver: Advance Index to just after match (default = false) else not.comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).Returns: True: Found and Index at start of matching text or just after if skipOver = true. False: Not found or Eos and Index unchanged. |
M: bool SkipToEol(bool skipOver = true) |
Skip to Eol or Eos (last line). - Optionally skip over the Eol if skipOver is true. Returns: False if started at Eos else True. |
M: bool SkipEol() |
Skip one NewLine. Must currently be at the newline (else ignored). Returns: True if not Eos after skipping else false. |
M: bool SkipConsecEol() |
Skip All consecutive NewLines. Must currently be at a newline (else ignored). Returns: True if not Eos after skipping else false. |
M: void SkipWhile(Func<char, bool> predicate) |
Skip all characters while the predicate matches (returns true), or Eos is reached. |
M: bool SkipBlock(string blockStart, string blockEnd, bool isOpen = false) |
Skip a block delimited by blockStart and blockEnd: - Handles Nesting. Parameters: isOpen: False - current Index at start of block else Index just inside block.Returns: True if not at the start of a non-open block or for a valid block (Index positioned after block). Else false and Logs an error (Index unchanged). |
Scanning Operations: | |
M: bool ScanTo(char delim, bool orToEos = false, bool skipOver = false) |
Scans up to the delim or to Eos (if orToEos it true): - Optionally skip over the delimiter if skipOver is true. - Token contains the intermediate text (excluding delimiter). Returns: True: Delimiter found or orToEos is true. Index at Eos, delimiter or after delimiter if skipOver False: Started at Eos or delimiter not found (and orToEos is false). Index unchanged. |
M: bool ScanToAny(string delims, bool orToEos = false) |
Scans up to any character in delims or to Eos (if orToEos it true): - Token contains the intermediate text (excluding delimiter). Returns: True: Delimiter found or orToEos is true. Index at delimiter or Eos. False: Started at Eos, delimiter not found (and orToEos is false) or delims is blank. Index unchanged. |
M: bool ScanToStr(string findString, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) |
Scan up to a match of findString: - Token contains the intermediate text (excluding findString). Parameters: comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).Returns: True: findString found and Index directly after findString. False: findString not found and Index remains at original position. |
M: bool ScanToAnyStr(IEnumerable<string> matchStrings, bool skipOver = false, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) |
Scan up to first occurrence of any string in matchStrings: - Token contains the intermediate text (excluding matching string). - Match contains the matching string. Parameters: matchStrings: Enumerable set of strings.skipOver: Advance Index to just after match (default = false) else not.comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).Returns: True: Found and Index at start of matching text or just after if skipOver = true. False: Not found or Eos. Index unchanged. |
M: bool ScanToAnyStr(string matchStrings, bool skipOver = false, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) |
Scan up to first occurrence of any string in delimited matchStrings. - Token contains the intermediate text (excluding matching string). - Match contains the matching string. Parameters: matchStrings: Delimited string and first character must be the delimiter (e.g. "|s1|s2|...").skipOver: Advance Index to just after match (default = false) else not.comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).Returns: True: Found and Index at start of matching text or just after if skipOver = true. False: Not found or Eos. Index unchanged. |
M: bool ScanToEol(bool skipEol = true) |
Scan to Eol and optionally skip over Eol: - Handles intermediate or last line (with no Eol). - Token contains the intermediate text (excluding the newline, may be empty). Returns: False if started at Eos else true. |
M: bool ValueToEol(bool skipEol = true) |
Scan a value (token) to Eol and optionally skip over Eol: - Handles intermediate or last line (with no Eol). - Token contains the intermediate text (excluding the newline). Returns: False if started at Eos or a non-valid Token else true. |
M: string LineRemainder() |
Return the remainder of the current line without changing the Index position. (mainly used for debugging or tracing). |
M: bool ScanWhile(Func<TextScanner, char, int, bool> predicate) |
Scan all characters while a predicate matches, or Eos is reached: - Predicate = Func: <this, current char, 0..n index from scan start, bool>. - Token contains the scanned characters. Returns: True if any characters are scanned (Index after last match) else false (Index unchanged). |
M: bool ScanBlock(string blockStart, string blockEnd, bool isOpen = false) |
Scan a block delimited by blockStart and blockEnd: - Handles Nesting. - Token contains the block content excluding the block delimiters. Parameters: isOpen: False - current Index at start of block else Index just inside block.Returns: True if not at the start of a non-open block or for a valid block (Index positioned after block). Else false and Logs an error (Index unchanged). |
Type Operations: | |
M: bool IsChType(Func<char, bool> predicate) |
Check if current character matches a predicate (without advancing Index). |
M: bool IsDigit() |
Check if current character is a Digit (via char.IsDigit()) (without advancing Index). |
M: bool IsLetter() |
Check if current character is a Letter (via char.IsLetter()) (without advancing Index). |
M: bool IsLetterOrDigit() |
Check if current character is a LetterOrDigit (via char.IsLetterOrDigit()) (without advancing Index). |
M: bool IsDecimal() |
Check if current character is a Decimal digit (IsDigit || '.') (without advancing Index). |
M: bool NumDecimal(out double value) |
Scan a decimal value of the form n*.n*. Returns: True and output double else false. |
M: bool NumInt(out int value) |
Scan an integer value of the form n*. Returns: True and output int else false. |
M: bool GetDigit() |
Get current character, into Delim, if it is a digit and advance Index. Else return false and Index unchanged. |
Error Logging and Handling: | |
M: (int line, int col, int offset, string astext) GetLineAndColumn(int pos = -1) |
Return Line and column number for given or current position in source. Used mainly for error reporting. Parameters: pos: Position (index) to get line and column for.If this value is -1 use the current scan Index. Returns: Tuple: (line (1..n), col (1..n), offset (0..n), astext ("Ln l+1 Col c+1")). |
M: string GetUptoLine(int pos = -1, int lastNoofLines = 0) |
Get all text up to and including the line containing pos (excluding Eol): - Optionally only get the lastNoofLines if > 0. Parameters: pos: Position or -1 for current Index position. |
M: bool LogError(string errorMsg, string errorContext = "Parse error", int errIndex = -1) |
Log an Error (see ScanErrorLog) with given erroMsg and errorContext: - At current Index position (default errIndex = -1) or at given errIndex ( >= 0 ). - Records the last 10 lines and Line and Column no in ScanErrorLog - for later display. Returns: False always - so can use to return false from caller. |
P: bool IsError |
Return current scanner Error status. |