This package provides a command line tool which named forgex-cli
for interacting with Forgex—Fortran Regular Expression.
The forgex-cli
command was originally part of Forgex package, but was moved to this separate repository starting with Forgex version 3.5.
Clone the repository:
git clone https://github.com/shinobuamasaki/forgex-cli
Alternatively, download the latest source package:
wget https://github.com/ShinobuAmasaki/forgex-cli/archive/refs/tags/v3.5.tar.gz
In that case, decompress the archive file:
tar xvzf v3.5.tar.gz
Change directory to the cloned or decompressed location:
cd forgex-cli
Execute building with Fortran Package Manager (fpm
):
fpm build
This will automatically resolve the dependency and compile forgex-cli
, including forgex
.
Operation of this command has been confirmed with the following compilers:
gfortran
) v13.2.1ifx
) 2024.0.0 20231017It is assumed that you will use the Fortran Package Manager(fpm
).
This article describes basic usage of forgex-cli
.
Currently, commands find
and debug
, and following subcommands and sub-subcommands can be executed:
forgex-cli
├── find
│ └── match
│ ├── lazy-dfa <pattern> <operator> <input text>
│ ├── dense <pattern> <operator> <input text>
│ └── forgex <pattern> <operator> <input text>
└── debug
├── ast <pattern>
└── thompson <pattern>
Run the forgex-cli
command as follows:
forgex-cli <comamnd> <subcommand> ...
fpm run -- <command> <subcommand> ...
find
commandUsing the find
command and the match
subcommand, you can specify an engine and run benchmark tests on regular expression matching with .in.
and .match.
operators.
After the subcommand, select the engine from,
lazy-dfa
,dense
,forgex
,and after that, specify the pattern, operator, and input string as if you were writing Fortran code using Forgex to perform matching.
For instance, execute the find
command:
forgex-cli find match lazy-dfa '([a-z]*g+)n?' .match. 'assign'
If you run it through fpm run
:
fpm run --profile release -- find match lazy-dfa '([a-z]*g+)n?' .match. 'assign'
and you will get output similar to the following:
pattern: ([a-z]*g+)n?
text: 'assign'
parse time: 42.9μs
extract literal time: 23.0μs
runs engine: T
compile nfa time: 26.5μs
dfa initialize time: 4.6μs
search time: 617.1μs
matching result: T
automata and tree size: 10324 bytes
========== Thompson NFA ===========
state 1: (?, 5)
state 2: <Accepted>
state 3: (n, 2)(?, 2)
state 4: (g, 7)
state 5: (["a"-"f"], 6)(g, 6)(["h"-"m"], 6)(n, 6)(["o"-"z"], 6)(?, 4)
state 6: (?, 5)
state 7: (?, 8)
state 8: (g, 9)(?, 3)
state 9: (?, 8)
=============== DFA ===============
1 : ["a"-"f"]=>2
2 : ["o"-"z"]=>2 ["h"-"m"]=>2 g=>3
3A: n=>4
4A:
state 1 = ( 1 4 5 )
state 2 = ( 4 5 6 )
state 3A = ( 2 3 4 5 6 7 8 )
state 4A = ( 2 4 5 6 )
===================================
debug
Using debug
command allows you to obtain information about the abstract syntax tree and the structure of the Thompson NFA.
For example, execute the debug
command with ast
subcommand:
forgex-cli debug ast 'foo[0-9]+bar'
then, you will get output similar to the following:
parse time: 133.8μs
extract time: 36.8μs
extracted literal:
extracted prefix: foo
extracted suffix: bar
memory (estimated): 848
(concatenate (concatenate (concatenate (concatenate (concatenate (concatenate "f" "o") "o") (concatenate [ "0"-"9";] (closure[ "0"-"9";]))) "b") "a") "r")
Note: Notice also that the prefix and suffix literals are now extracted.
Here's how to get a graph of the NFA. To get the Thompson NFA, run the following command:
forgex-cli debug thompson 'foo[0-9]+bar'
This will give you output like this:
parse time: 144.5μs
compile nfa time: 57.0μs
memory (estimated): 11589
========== Thompson NFA ===========
state 1: (f, 8)
state 2: <Accepted>
state 3: (r, 2)
state 4: (a, 3)
state 5: (b, 4)
state 6: (["0"-"9"], 9)
state 7: (o, 6)
state 8: (o, 7)
state 9: (?, 10)
state 10: (["0"-"9"], 11)(?, 5)
state 11: (?, 10)
Note: all segments of NFA were disjoined with overlapping portions.
===================================
--help
command line argument.forgex-cli
command with PowerShell on Windows, use UTF-8 as your system locale to properly input and output Unicode characters.The following features are planned to be implemented in the future:
All code contained herein shall be written with a three-space indentation.
The command-line interface design of forgex-cli
was inspired in part by the package regex-cli
of Rust language.
Forgex-CLI is as a freely available under the MIT license. See LICENSE.