Skip to content

Latest commit

 

History

History
93 lines (89 loc) · 17.5 KB

TextScanner.md

File metadata and controls

93 lines (89 loc) · 17.5 KB

class TextScanner : ILogScanError

A comprehensive low level core Scanner suitable for any text parsing and/or Lexer construction.

The Text Scanner is used to extract Tokens from text, check for strings or characters (delimiters), skip over text etc.

Incorporates the following basic operations:

  • Maintains an Index (a scan pointer) for all scanning operations.
  • Check for characters or string at the current Index (or Index + offset).
  • Scan up to characters or strings.
  • Scanned text is recored in Token for later access.
  • Several Skipping operations.
  • Some applicable operations record a delimiter in Delim
  • Error logging via ScanErrorLog (which also records the position of the error for later reporting)

In the documentation below the following abbreviations are used:

  • Eos: End of Source (or string)
  • Eol: End of Line
Members Description
Constructor:
C: TextScanner(string source, ScanErrorLog errorLog = null) Create a TextScanner with given source string and 'internal' (errorLog == null) or 'external' ScanErrorLog.
Implementation:
P: char Delim Get last delimiter logged (where applicable).
P: string Match Get the matching string for the last IsAnyString or SkipToAnyStr method call.
P: ScanErrorLog ErrorLog Get/Set the bound ScanErroLog.
Token operations: Notes: Several scanning operations record the scanned text in Token. The following services are used to operate on this token.
P: bool IsToken Check if a Token currently exists.
M: void SetTokenRange(int startIndex, int endIndex) Manually set the Token start and end index, which will be used to retrieve the Token on the next call:
- The scanner automatically maintains these indexes for any operation that records a token.
- This should only be used in special cases (say for extensions). The values are set to 0 (empty Token) if out of range.

Parameters:
startIndex: The zero-based starting position, or less-than zero for the current index position.
endIndex: The zero-based ending position. Adjusts to Eos if less-than zero or out of range.
P: string Token Get the current Token else string.Empty for none.
P: string TrimToken Get current token Trimmed.
M: bool ValidToken() Check if the current Token is not null or WhiteSpace.
Source Management:
M: void Insert(string text) Insert text at the current Index, and continue scanning from there.
M: void InsertLine(string text) Insert text and newline (\r\n or \n) at the current Index, and continue scanning from there.
M: void Remove(int startIndex) Remove a section of the Source string, from startIndex up to, but excluding, current Index.
M: void SetSource(string source) Set the Scanner Source from a String and reset Index to start.
M: string SubSource(int startIndex, int length = -1) Retrieve a substring of the scanner Source:
- Mainly used for debugging and tracing.

Parameters:
startIndex: The zero-based starting position, or less-than zero for the current index position.
length: The number of characters to retrieve, Adjusts to Eos if less-than zero or out of range.

Returns:
A string from startIndex of length length:
- Or empty string if startIndex is greater-than source length or length is zero.
Index Management:
P: int Index Get: current scan index.
Set: scan index (0 = start, < 0 or > length = end, else intermediate).
Core Utilities:
M: int CountCh(char c) Get count of consecutive matching characters and advances Index.
M: bool IsAnyCh(string chars) Check if character at Index is one of the chars.

Returns:
True: if found, advances the Index and logs the char in Delim.
False: if not found and Index is unchanged.
M: bool IsAnyString(IEnumerable<string> matchStrings, bool advanceIndex = true, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) Check if text at Index equals any string in matchStrings and optionally advance the Index if it matches.
- Match contains the matching string.

Parameters:
matchStrings: Enumerable set of strings to match.
advanceIndex: Advance Index to just after match (default) else not.
comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).
M: bool IsAnyString(string matchStrings, bool advanceIndex = true, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) Check if text at Index equals any string in delimited matchStrings and optionally advance the Index if it matches.
- Match contains the matching string.

Parameters:
matchStrings: Delimited strings and first character must be the delimiter (e.g. "|s1|s2|...")
advanceIndex: Advance Index to just after match (default) else not
comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase)
M: bool IsCh(char c) Check if the character at Index matches c and advance Index if true.
P: bool IsEol Query if Index is at End of Line.
P: bool IsEos Check if Index is at End of Source.
P: bool IsEosOrEol Query if Index is at Eos or Eol.
M: bool IsPeekAnyCh(string chars, int offset = 0) Check if character at relative offset to Index matches any one of the chars (index unchanged).
M: bool IsPeekCh(char c, int offset = 0) Check if character at relative offset to Index matches c (index unchanged).
M: bool IsString(string matchString, bool advanceIndex = true, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) Check if text at Index equals matchString and optionally advance the Index if it matches.

Parameters:
matchString: String to match.
advanceIndex: Advance Index to just after match (default) or not.
comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).
M: char NextCh() Get character at Index and increments Index, or '0' for Eos.
M: char PeekCh(int offset = 0) Get character at relative offset to Index (index unchanged).

Returns:
Character or Eos ('0') if out of range.
M: void ToEos() Advance Index to Eos.
Skipping Operations:
M: bool Skip(char skipChar) Skip while character is skipChar.

Returns:
True if not Eos after skipping else false.
M: bool SkipAny(string skipChars) Skip while character is any of the skipChars.

Returns:
True if not Eos after skipping else false.
M: bool SkipTo(char termChar, bool skipOver = false) Skip until the termChar is found:
- Optionally skip over the delimiter if skipOver is true.

Returns:
True: Found and Index at matching char or next if skipOver = true.
False: Not found or Eos and Index unchanged.
M: bool SkipToAny(string termChars, bool skipOver = false) Skip until any one of the termChars is found.
- Delim contains the matching character.
- Optionally skip over the delimiter if skipOver is true.

Returns:
True: Found and Index at matching char or next if skipOver = true.
False: Not found or Eos and Index unchanged.
M: bool SkipToStr(string text, bool skipOver = false, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) Skip up to given text and optionally skip over it if skipOver is true.

Parameters:
comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).

Returns:
True: Found and Index at start of matching text or just after if skipOver = true.
False: Not found or Eos and Index unchanged.
M: bool SkipToAnyStr(IEnumerable<string> matchStrings, bool skipOver = false, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) Skip up to first occurrence of any string in matchStrings and optionally skip over the matching string.
- Match contains the matching string.

Parameters:
matchStrings: Enumerable set of strings.
skipOver: Advance Index to just after match (default = false) else not
comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase)

Returns:
True: Found and Index at start of matching text or just after if skipOver = true.
False: Not found or Eos and Index unchanged.
M: bool SkipToAnyStr(string matchStrings, bool skipOver = false, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) Skip up to first occurrence of any string in delimited matchStrings and optionally skip over the matching string.
- Match contains the matching string.

Parameters:
matchStrings: Delimited string and first character must be the delimiter (e.g. "|s1|s2|...").
skipOver: Advance Index to just after match (default = false) else not.
comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).

Returns:
True: Found and Index at start of matching text or just after if skipOver = true.
False: Not found or Eos and Index unchanged.
M: bool SkipToEol(bool skipOver = true) Skip to Eol or Eos (last line).
- Optionally skip over the Eol if skipOver is true.

Returns:
False if started at Eos else True.
M: bool SkipEol() Skip one NewLine. Must currently be at the newline (else ignored).

Returns:
True if not Eos after skipping else false.
M: bool SkipConsecEol() Skip All consecutive NewLines. Must currently be at a newline (else ignored).

Returns:
True if not Eos after skipping else false.
M: void SkipWhile(Func<char, bool> predicate) Skip all characters while the predicate matches (returns true), or Eos is reached.
M: bool SkipBlock(string blockStart, string blockEnd, bool isOpen = false) Skip a block delimited by blockStart and blockEnd:
- Handles Nesting.

Parameters:
isOpen: False - current Index at start of block else Index just inside block.

Returns:
True if not at the start of a non-open block or for a valid block (Index positioned after block).
Else false and Logs an error (Index unchanged).
Scanning Operations:
M: bool ScanTo(char delim, bool orToEos = false, bool skipOver = false) Scans up to the delim or to Eos (if orToEos it true):
- Optionally skip over the delimiter if skipOver is true.
- Token contains the intermediate text (excluding delimiter).

Returns:
True: Delimiter found or orToEos is true. Index at Eos, delimiter or after delimiter if skipOver
False: Started at Eos or delimiter not found (and orToEos is false). Index unchanged.
M: bool ScanToAny(string delims, bool orToEos = false) Scans up to any character in delims or to Eos (if orToEos it true):
- Token contains the intermediate text (excluding delimiter).

Returns:
True: Delimiter found or orToEos is true. Index at delimiter or Eos.
False: Started at Eos, delimiter not found (and orToEos is false) or delims is blank. Index unchanged.
M: bool ScanToStr(string findString, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) Scan up to a match of findString:
- Token contains the intermediate text (excluding findString).

Parameters:
comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).

Returns:
True: findString found and Index directly after findString.
False: findString not found and Index remains at original position.
M: bool ScanToAnyStr(IEnumerable<string> matchStrings, bool skipOver = false, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) Scan up to first occurrence of any string in matchStrings:
- Token contains the intermediate text (excluding matching string).
- Match contains the matching string.

Parameters:
matchStrings: Enumerable set of strings.
skipOver: Advance Index to just after match (default = false) else not.
comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).

Returns:
True: Found and Index at start of matching text or just after if skipOver = true.
False: Not found or Eos. Index unchanged.
M: bool ScanToAnyStr(string matchStrings, bool skipOver = false, StringComparison comp = StringComparison.InvariantCultureIgnoreCase) Scan up to first occurrence of any string in delimited matchStrings.
- Token contains the intermediate text (excluding matching string).
- Match contains the matching string.

Parameters:
matchStrings: Delimited string and first character must be the delimiter (e.g. "|s1|s2|...").
skipOver: Advance Index to just after match (default = false) else not.
comp: Comparison type (default = StringComparison.InvariantCultureIgnoreCase).

Returns:
True: Found and Index at start of matching text or just after if skipOver = true.
False: Not found or Eos. Index unchanged.
M: bool ScanToEol(bool skipEol = true) Scan to Eol and optionally skip over Eol:
- Handles intermediate or last line (with no Eol).
- Token contains the intermediate text (excluding the newline, may be empty).

Returns:
False if started at Eos else true.
M: bool ValueToEol(bool skipEol = true) Scan a value (token) to Eol and optionally skip over Eol:
- Handles intermediate or last line (with no Eol).
- Token contains the intermediate text (excluding the newline).

Returns:
False if started at Eos or a non-valid Token else true.
M: string LineRemainder() Return the remainder of the current line without changing the Index position.
(mainly used for debugging or tracing).
M: bool ScanWhile(Func<TextScanner, char, int, bool> predicate) Scan all characters while a predicate matches, or Eos is reached:
- Predicate = Func: <this, current char, 0..n index from scan start, bool>.
- Token contains the scanned characters.

Returns:
True if any characters are scanned (Index after last match) else false (Index unchanged).
M: bool ScanBlock(string blockStart, string blockEnd, bool isOpen = false) Scan a block delimited by blockStart and blockEnd:
- Handles Nesting.
- Token contains the block content excluding the block delimiters.

Parameters:
isOpen: False - current Index at start of block else Index just inside block.

Returns:
True if not at the start of a non-open block or for a valid block (Index positioned after block).
Else false and Logs an error (Index unchanged).
Type Operations:
M: bool IsChType(Func<char, bool> predicate) Check if current character matches a predicate (without advancing Index).
M: bool IsDigit() Check if current character is a Digit (via char.IsDigit()) (without advancing Index).
M: bool IsLetter() Check if current character is a Letter (via char.IsLetter()) (without advancing Index).
M: bool IsLetterOrDigit() Check if current character is a LetterOrDigit (via char.IsLetterOrDigit()) (without advancing Index).
M: bool IsDecimal() Check if current character is a Decimal digit (IsDigit || '.') (without advancing Index).
M: bool NumDecimal(out double value) Scan a decimal value of the form n*.n*.

Returns:
True and output double else false.
M: bool NumInt(out int value) Scan an integer value of the form n*.

Returns:
True and output int else false.
M: bool GetDigit() Get current character, into Delim, if it is a digit and advance Index. Else return false and Index unchanged.
Error Logging and Handling:
M: (int line, int col, int offset, string astext) GetLineAndColumn(int pos = -1) Return Line and column number for given or current position in source. Used mainly for error reporting.

Parameters:
pos: Position (index) to get line and column for.
If this value is -1 use the current scan Index.

Returns:
Tuple: (line (1..n), col (1..n), offset (0..n), astext ("Ln l+1 Col c+1")).
M: string GetUptoLine(int pos = -1, int lastNoofLines = 0) Get all text up to and including the line containing pos (excluding Eol):
- Optionally only get the lastNoofLines if > 0.

Parameters:
pos: Position or -1 for current Index position.
M: bool LogError(string errorMsg, string errorContext = "Parse error", int errIndex = -1) Log an Error (see ScanErrorLog) with given erroMsg and errorContext:
- At current Index position (default errIndex = -1) or at given errIndex ( >= 0 ).
- Records the last 10 lines and Line and Column no in ScanErrorLog - for later display.

Returns:
False always - so can use to return false from caller.
P: bool IsError Return current scanner Error status.