## regex -- Oniguruma (Onigmo) regular expressions
### Overview
This module provides regular expressions based on [Onigmo](https://github.com/k-takata/Onigmo) fork of [Oniguruma](http://www.geocities.jp/kosako3/oniguruma/)
library. it uses Ruby grammar with several customizable syntax modifications:
- by default, Dao string patterns are mimicked: `%` is used instead of `\`, whitespace characters are ignored outside of `[...]` groups
- one-line comments starting with `#` can be used
- implicit spacing mode: outside of `[...]`, a standalone whitespace character or `\r\n` are interpreted as `\\s*`, and a pair of equal whitespace characters
is interpreted as `\\s+`
Grammar description can be found in `Onigmo/doc/RE`.
### Installation
Currently, Onigmo should be built manually from the source provided with the module. In order to link it statically to `regex` module, Onigmo should be
configured as `./configure CFLAGS=-fPIC LFLAGS=-fPIC` to enable position-independent code (on Windows, the relevant makefiles need to be edited). Consult
`Onigmo/README` for details as to how to build the library on various platforms.
### Index
namespace [re](#re)
invar class [Regex](#regex)
- [.pattern](#pattern)(_self_: Regex) => string
- [.groupCount](#groupcount)(_self_: Regex) => int
- [.ignoresCase](#ignorescase)(_self_: Regex) => bool
- [fetch](#fetch)(_self_: Regex, _target_: string, _group_: int|string = 0, _start_ = 0, _end_ = -1) => string
- [search](#search)(_self_: Regex, _target_: string, _start_ = 0, _end_ = -1) => Match|none
- [matches](#matches)(_self_: Regex, _target_: string) => bool
- [extract](#extract)(_self_: Regex, _target_: string, _matchType_: enum<both,matched,unmatched> = $matched) => list<string>
- [replace](#replace)(_self_: Regex, _target_: string, _format_: string, _start_ = 0, _end_ = -1) => string
- [scan](#scan)(_self_: Regex, _target_: string, _start_ = 0, _end_ = -1)[_found_: Match => none|@V] => list<@V>
- [replace](#replace2)(_self_: Regex, _target_: string, _start_ = 0, _end_ = -1)[_found_: Match => string] => string
- [iter](#iter)(_self_: Regex, _target_: string, _start_ = 0, _end_ = -1) => Iter
invar class [Match](#match)
- [string](#string)(_self_: Match, _group_: int|string = 0) => string
- [size](#size)(_self_: Match, _group_: int|string = 0) => int
- [start](#start)(_self_: Match, _group_: int|string = 0) => int
- [end](#end)(_self_: Match, _group_: int|string = 0) => int
- [.groupCount](#groupcount2)(_self_: Match) => int
class [Iter](#iter2)
- [for](#for)(_self_: Iter, _iterator_: ForIterator)
- [[]](#index)(_self_: Iter, _index_: ForIterator) => Match
Functions:
- [compile](#compile)(_pattern_: string) => Regex
- [compile](#compile)(_pattern_: string, _options_: enum<strictSpacing;impliedSpacing;ignoreCase;allowComments;useBackslash>) => Regex
### Classes
#### `re::Regex`
Regular expression using [Onigmo fork](https://github.com/k-takata/Onigmo) of [Oniguruma](http://www.geocities.jp/kosako3/oniguruma/) library with Ruby grammar as backend. See [compile()(#compile) for usage details.
#### Methods
```ruby
.pattern(self: Regex) => string
```
String pattern
__Note:__ The pattern is stored in canonical form, i.e. with strict spacing, '\' as escape character and without comments
```ruby
.groupCount(self: Regex) => int
```
Number of capture groups in the pattern
```ruby
.ignoresCase(self: Regex) => bool
```
Case-insensitivity
```ruby
fetch(self: Regex, target: string, group: int|string = 0, start = 0, end = -1) => string
```
Finds the first match in *target* in the range [*start*; *end*] and returns sub-match specified by *group*.
__Note:__ For the interpretation of group numbers, see [Match](#match)
**Errors:** `Param` in case of invalid *group* or matching range
```ruby
search(self: Regex, target: string, start = 0, end = -1) => Match|none
```
Returns the first match in *target* in the range [*start*; *end*], or `none` if no match was found
**Errors:** `Param` in case of invalid matching range
```ruby
matches(self: Regex, target: string) => bool
```
Checks if the entire *target* is matched by the regex
```ruby
extract(self: Regex, target: string, matchType: enum = $matched) => list
```
Returns all matches in *target* (or unmatched, or both, depending on *matchType*)
```ruby
replace(self: Regex, target: string, format: string, start = 0, end = -1) => string
```
Replaces all matches in *target* in the range [*start*; *end*] with *format* string. Returns the entire resulting string. *format* may contain backreferences
in the form '$<group number from 0 to 9>' or '$(<group name>)'; '$$' can be to escape '$'
**Errors:** `Param` in case of invalid matching range, `Regex` in case of invalid backreference
```ruby
scan(self: Regex, target: string, start = 0, end = -1)[found: Match => none|@V] => list<@V>
```
Iterates over all matches in *target* in the range [*start*; *end*], yielding each match as *found*. Returns the list of values obtained from the code
section
**Errors:** `Param` in case of invalid matching range
```ruby
replace(self: Regex, target: string, start = 0, end = -1)[found: Match => string] => string
```
Iterates over all matches in *target*, yielding each of them as *found*. Returns the string formed by replacing each match in *target* by the corresponding
string returned from the code section
**Errors:** `Param` in case of invalid matching range
```ruby
iter(self: Regex, target: string, start = 0, end = -1) => Iter
```
Returns `for` iterator to iterate over all matches in *target* in the range [*start*; *end*].
__Note:__ Changing *target* has no effect on the iteration process (the iterator will still be bound to the original string)
**Errors:** `Param` in case of invalid matching range
------
#### `re::Match`
Single regular expression match providing information on matched sub-string and individual captured groups.
*group* parameter in `Match` methods may either be a group number or its name.
Group number is interpreted the following way:
- *group* == 0 -- entire matched sub-string
- *group* > 0 and *group* <= `groupCount()` -- corresponding sub-match
- *group* < 0 or *group* > `groupCount()` -- not permitted
If *group* is a name, the last group in the pattern with this name is assumed (at least one such group must exist).
#### Methods
```ruby
string(self: Match, group: int|string = 0) => string
```
Sub-string captured by *group*
**Errors:** `Param` in case of invalid *group*
```ruby
size(self: Match, group: int|string = 0) => int
```
Size of the sub-string captured by *group*
**Errors:** `Param` in case of invalid *group*
```ruby
start(self: Match, group: int|string = 0) => int
```
Start position of the sub-string captured by *group*
**Errors:** `Param` in case of invalid *group*
```ruby
end(self: Match, group: int|string = 0) => int
```
End position of the sub-string captured by *group*
**Errors:** `Param` in case of invalid *group*
```ruby
.groupCount(self: Match) => int
```
Number of captured groups
------
#### `re::Iter`
`for` iterator to iterate over regular expression matches in a string
```ruby
for(self: Iter, iterator: ForIterator)
```
```ruby
[](self: Iter, index: ForIterator) => Match
```
### Functions
```ruby
compile(pattern: string) => Regex
compile(pattern: string, options: enum) => Regex
```
Constructs regular expression from *pattern* using specified *options* (if provided).
Default options mimic Dao string patterns syntax:
- free spacing -- whitespace is ignored outside of '[...]'
- '%' is used as control character
- the pattern is treated as case-sensitive
This behavior can be overridden with the following values of *options*:
- `$strictSpacing` -- whitespace characters in the pattern are treated 'as is' (canonical behavior)
- `$impliedSpacing` -- outside of '[ ... ]', a standalone whitespace character or '\r\n' are interpreted as '\\s*',
and a pair of equal whitespace characters is interpreted as '\\s+'
- `$ignoreCase` -- the pattern is treated as case-insensitive
- `$allowComments` -- all characters starting from '#' up to '\n' (or end of string) are ignored as comments ('#' can be escaped)
- `$useBackslash` -- use canonical '\' as control character
__Note:__ Regular expression engine presumes UTF-8-encoded patterns
**Errors:** `Param` in case of conflicting spacing options, `Regex` in case of regular expression grammar error