saw is a command-line application written in rust that aims to combine common use cases of sed and awk into one tool with an intuitive and easy-to-use syntax.
saw is still in development and is not feature-complete yet.
saw processes text line-by-line either from a file or from standard input. Its basic usage is
saw [-f <path>] <prog> ...
where <path>
is the path of an input file.
A saw program consists of a list of "atoms". An atom can be considered a text processing unit that takes an input string, modifies it, and then either passes it on to the next atom in the list or prints it to the screen.
Atoms may also have an internal state that allows them to be "context aware"; that is, an atom's behaviour might depend on the lines previously processed.
Consider the following INI file.
; example.ini
[Section 1]
name=value1
[Section 2]
name=value2
The line in [Section 2]
starting with name
can be extracted from this file
with the following command:
saw -f example.ini filter-range '^\[Section 2' '^\[' filter '^name'
This saw program can be read as:
First, filter for everything in between lines starting with
[Section 2
and[
; then filter the remaining lines for ones starting withname
.
Note that filter-range
and filter
can be abbreviated to fr
and f
,
respectively. Other atom identifier have similar abbreviations; details can be found in the Section List of Atoms below.
This is also one of the examples that led to the creation of saw since doing this with awk or sed requires a fair amount of boilerplate code.
saw's atoms always consist of one keyword (that might have one or more
aliases) as well as a fixed number of arguments. The keyword and
arguments are passed as seperate arguments to the saw
command.
Note that the regular expressions used in saw
are the ones from rust's
regex crate; it syntax can be found
here.
filter <regex>
Aliases: f
grep-like filter: If a the input matches <regex>
, it is passed on
without modification. If not, filter
stops the processing of the
current line and the next line is processed.
match <regex>
Aliases: m
Restrict the usage of following atoms to lines matching <regex>
.
It is similar to filter
with the only difference being that input
not matching <regex>
is printed out without modifications instead
of being discarded.
sub <regex> <replacement>
Aliases: s
Replaces the first match of <regex>
with <replacement>
.
Numbered and named capture groups are supported and can be referenced in
<replacement>
as ${#}
or ${name}
. The braces can be dropped
if the following character is not a character allowed in the name of a
caputure group (See also
here).
gsub <regex> <replacement>
Aliases: g
Like sub
but replaces all occurences instead of only the first one.
enumerate
Aliases: enum
, e
, #
Enumerates the input by prepending '<nr>
', where <nr>
is the number
of inputs processed so far. Note that this is not neccessarily the line
number if enumerate
is behind a filter
or match
.
fields <fields>
Aliases: F
Splits the input at whitespaces and passes on only the fields specified by
<fields>
seperated by single spaces.
The individual fields are identified by position either by a positive or
by a parenthesized negative integer; 1
and (-1)
represent the first
and last field, respectively.
<fields>
is then a comma-separated list of either single fields or
two hyphen-separated fields representing a range of fields with the
upper bound included. The upper and/or lower bounds may be dropped to
represent unbounded ranges. Examples of valid inputs for <fields>
are
2
, 1,3-(-2)
and 2,4-
.
lines <lines>
Aliases: line
, l
Filters the input by count for inputs specified by <lines>
. The
syntax for <lines>
is the same as for <fields>
above with the
difference that only positive integers are allowed.
For example, lines 1,5-
will only "leave through"
the first input it received and every input starting with the fifth.
If lines <lines>
is the first atom in a saw program, it simply filters
out all the lines of the input text except for the ones specified by
<lines>
(hence the name).
filter-range <regex1> <regex2>
Aliases: fr
Context aware atom that filters the input for blocks beginning with an
input matching <regex1>
and ending with an input matching <regex2>
.
It also resets for each block the internal state of all atoms following
it.
TODO: Add example
match-range <regex1> <regex2>
Aliases: mr
Restrict the usage of following atoms to blocks of inputs delimited by inputs matching <regex1>
and <regex2>
, respectively.
It is similar to filter-range
with the only difference being that input
outside of those blocks is printed out without modifications instead
of being discarded.