Skip to content

Commit

Permalink
awk: Merge 20210729 from One True Awk upstream (0592de4a)
Browse files Browse the repository at this point in the history
July 27, 2021:
	As per IEEE Std 1003.1-2008, -F "str" is now consistent with
	-v FS="str" when str is null. Thanks to Warner Losh.

July 24, 2021:
	Fix readrec's definition of a record. This fixes an issue
	with NetBSD's RS regular expression support that can cause
	an infinite read loop. Thanks to Miguel Pineiro Jr.

	Fix regular expression RS ^-anchoring. RS ^-anchoring needs to
	know if it is reading the first record of a file. This change
	restores a missing line that was overlooked when porting NetBSD's
	RS regex functionality. Thanks to Miguel Pineiro Jr.

	Fix size computation in replace_repeat() for special case
	REPEAT_WITH_Q. Thanks to Todd C. Miller.

Also, included the tests from upstream, though they aren't yet connected
to the tree.

Sponsored by:		Netflix
  • Loading branch information
bsdimp committed Aug 1, 2021
2 parents a61c24d + f9002b8 commit 23f2437
Show file tree
Hide file tree
Showing 330 changed files with 71,810 additions and 10 deletions.
17 changes: 17 additions & 0 deletions contrib/one-true-awk/FIXES
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,23 @@ THIS SOFTWARE.
This file lists all bug fixes, changes, etc., made since the AWK book
was sent to the printers in August, 1987.

July 27, 2021:
As per IEEE Std 1003.1-2008, -F "str" is now consistent with
-v FS="str" when str is null. Thanks to Warner Losh.

July 24, 2021:
Fix readrec's definition of a record. This fixes an issue
with NetBSD's RS regular expression support that can cause
an infinite read loop. Thanks to Miguel Pineiro Jr.

Fix regular expression RS ^-anchoring. RS ^-anchoring needs to
know if it is reading the first record of a file. This change
restores a missing line that was overlooked when porting NetBSD's
RS regex functionality. Thanks to Miguel Pineiro Jr.

Fix size computation in replace_repeat() for special case
REPEAT_WITH_Q. Thanks to Todd C. Miller.

February 15, 2021:
Small fix so that awk will compile again with g++. Thanks to
Arnold Robbins.
Expand Down
8 changes: 6 additions & 2 deletions contrib/one-true-awk/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,13 +107,17 @@ astonishly slow. If `awk` seems slow, you might try fixing that.
More generally, turning on optimization can significantly improve
`awk`'s speed, perhaps by 1/3 for highest levels.

## A Note About Releases

We don't do releases.

## A Note About Maintenance

NOTICE! Maintenance of this program is on a ``best effort''
NOTICE! Maintenance of this program is on a ''best effort''
basis. We try to get to issues and pull requests as quickly
as we can. Unfortunately, however, keeping this program going
is not at the top of our priority list.

#### Last Updated

Fri Dec 25 16:53:34 EST 2020
Sat Jul 25 14:00:07 EDT 2021
19 changes: 19 additions & 0 deletions contrib/one-true-awk/TODO
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Wed Jan 22 02:10:35 MST 2020
============================

Here are some things that it'd be nice to have volunteer
help on.

1. Rework the test suite so that it's easier to maintain
and see exactly which tests fail:
A. Extract beebe.tar into separate file and update scripts
B. Split apart multiple tests into separate tests with input
and "ok" files for comparisons.

2. Pull in more of the tests from gawk that only test standard features.
The beebe.tar file appears to be from sometime in the 1990s.

3. Make the One True Awk valgrind clean. In particular add a
a test suite target that runs valgrind on all the tests and
reports if there are any definite losses or any invalid reads
or writes (similar to gawk's test of this nature).
9 changes: 3 additions & 6 deletions contrib/one-true-awk/b.c
Original file line number Diff line number Diff line change
Expand Up @@ -935,7 +935,7 @@ replace_repeat(const uschar *reptok, int reptoklen, const uschar *atom,
if (special_case == REPEAT_PLUS_APPENDED) {
size++; /* for the final + */
} else if (special_case == REPEAT_WITH_Q) {
size += init_q + (atomlen+1)* n_q_reps;
size += init_q + (atomlen+1)* (n_q_reps-init_q);
} else if (special_case == REPEAT_ZERO) {
size += 2; /* just a null ERE: () */
}
Expand Down Expand Up @@ -964,11 +964,8 @@ replace_repeat(const uschar *reptok, int reptoklen, const uschar *atom,
}
}
memcpy(&buf[j], reptok+reptoklen, suffix_length);
if (special_case == REPEAT_ZERO) {
buf[j+suffix_length] = '\0';
} else {
buf[size] = '\0';
}
j += suffix_length;
buf[j] = '\0';
/* free old basestr */
if (firstbasestr != basestr) {
if (basestr)
Expand Down
28 changes: 28 additions & 0 deletions contrib/one-true-awk/bugs-fixed/REGRESS
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
#! /bin/bash

if [ ! -f ../a.out ]
then
echo Making executable
(cd .. ; make) || exit 0
fi

for i in *.awk
do
echo === $i
OUT=${i%.awk}.OUT
OK=${i%.awk}.ok
IN=${i%.awk}.in
input=
if [ -f $IN ]
then
input=$IN
fi

../a.out -f $i $input > $OUT 2>&1
if cmp -s $OK $OUT
then
rm -f $OUT
else
echo ++++ $i failed!
fi
done
1 change: 1 addition & 0 deletions contrib/one-true-awk/bugs-fixed/fs-overflow.ok
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
foo
4 changes: 4 additions & 0 deletions contrib/one-true-awk/bugs-fixed/inf-nan-torture.awk
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{
for (i = 1; i <= NF; i++)
print i, $i, $i + 0
}
1 change: 1 addition & 0 deletions contrib/one-true-awk/bugs-fixed/inf-nan-torture.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
-inf -inform inform -nan -nancy nancy -123 0 123 +123 nancy +nancy +nan inform +inform +inf
16 changes: 16 additions & 0 deletions contrib/one-true-awk/bugs-fixed/inf-nan-torture.ok
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
1 -inf -inf
2 -inform 0
3 inform 0
4 -nan -nan
5 -nancy 0
6 nancy 0
7 -123 -123
8 0 0
9 123 123
10 +123 123
11 nancy 0
12 +nancy 0
13 +nan +nan
14 inform 0
15 +inform 0
16 +inf +inf
1 change: 1 addition & 0 deletions contrib/one-true-awk/bugs-fixed/pfile-overflow.awk
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
\
4 changes: 4 additions & 0 deletions contrib/one-true-awk/bugs-fixed/pfile-overflow.ok
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
../a.out: syntax error at source line 1 source file pfile-overflow.awk
context is
>>> <<<
../a.out: bailing out at source line 1 source file pfile-overflow.awk
1 change: 1 addition & 0 deletions contrib/one-true-awk/bugs-fixed/rs_underflow.awk
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
BEGIN { RS="zx" } { print $1 }
1 change: 1 addition & 0 deletions contrib/one-true-awk/bugs-fixed/rs_underflow.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1 change: 1 addition & 0 deletions contrib/one-true-awk/bugs-fixed/rs_underflow.ok
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
4 changes: 3 additions & 1 deletion contrib/one-true-awk/lib.c
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,7 @@ int getrec(char **pbuf, int *pbufsize, bool isrecord) /* get next input record *
infile = stdin;
else if ((infile = fopen(file, "r")) == NULL)
FATAL("can't open file %s", file);
innew = true;
setfval(fnrloc, 0.0);
}
c = readrec(&buf, &bufsize, infile, innew);
Expand Down Expand Up @@ -241,6 +242,7 @@ int readrec(char **pbuf, int *pbufsize, FILE *inf, bool newflag) /* read one rec
}
if (found)
setptr(patbeg, '\0');
isrec = (found == 0 && *buf == '\0') ? false : true;
} else {
if ((sep = *rs) == 0) {
sep = '\n';
Expand Down Expand Up @@ -270,10 +272,10 @@ int readrec(char **pbuf, int *pbufsize, FILE *inf, bool newflag) /* read one rec
if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readrec 3"))
FATAL("input record `%.30s...' too long", buf);
*rr = 0;
isrec = (c == EOF && rr == buf) ? false : true;
}
*pbuf = buf;
*pbufsize = bufsize;
isrec = *buf || !feof(inf);
DPRINTF("readrec saw <%s>, returns %d\n", buf, isrec);
return isrec;
}
Expand Down
2 changes: 1 addition & 1 deletion contrib/one-true-awk/main.c
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
THIS SOFTWARE.
****************************************************************/

const char *version = "version 20210215";
const char *version = "version 20210724";

#define DEBUG
#include <stdio.h>
Expand Down
10 changes: 10 additions & 0 deletions contrib/one-true-awk/testdir/Compare.T1
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@

oldawk=${oldawk-awk}
awk=${awk-../a.out}

echo oldawk=$oldawk, awk=$awk

for i in T.*
do
$i
done
35 changes: 35 additions & 0 deletions contrib/one-true-awk/testdir/Compare.drek
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# an arbitrary collection of input data

cat td.1 td.1 >foo.td
sed 's/^........................//' td.1 >>foo.td
pr -m td.1 td.1 td.1 >>foo.td
pr -2 td.1 >>foo.td
wc foo.td

td=foo.td
>footot

for i in $*
do
echo $i >/dev/tty
echo $i '<<<'
cd ..
echo testdir/$i:
ind <testdir/$i
a.out -f testdir/$i >drek.c
cat drek.c
make drek || ( echo $i ' ' bad compile; echo $i ' ' bad compile >/dev/tty; continue )
cd testdir

time /usr/bin/awk -f $i $td >foo2 2>foo2t
cat foo2t
time ../drek $td >foo1 2>foo1t
cat foo1t
cmp foo1 foo2 || ( echo $i ' ' bad; echo $i ' ' bad >/dev/tty; diff foo1 foo2 | sed 20q )
echo '>>>' $i
echo
echo $i: >>footot
cat foo1t foo2t >>footot
done

ctimes footot
17 changes: 17 additions & 0 deletions contrib/one-true-awk/testdir/Compare.p
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@

oldawk=${oldawk-awk}
awk=${awk-../a.out}

echo oldawk=$oldawk, awk=$awk

for i
do
echo "$i:"
$oldawk -f $i test.countries test.countries >foo1
$awk -f $i test.countries test.countries >foo2
if cmp -s foo1 foo2
then true
else echo -n "$i: BAD ..."
fi
diff -b foo1 foo2 | sed -e 's/^/ /' -e 10q
done
17 changes: 17 additions & 0 deletions contrib/one-true-awk/testdir/Compare.t
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@

oldawk=${oldawk-myawk}
awk=${awk-../a.out}

echo oldawk=$oldawk, awk=$awk

for i
do
echo "$i:"
$oldawk -f $i test.data >foo1
$awk -f $i test.data >foo2
if cmp -s foo1 foo2
then true
else echo -n "$i: BAD ..."
fi
diff -b foo1 foo2 | sed -e 's/^/ /' -e 10q
done
49 changes: 49 additions & 0 deletions contrib/one-true-awk/testdir/Compare.tt
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/sh

oldawk=${oldawk-awk}
awk=${awk-../a.out}

echo compiling time.c
gcc time.c -o time
time=./time

echo time command = $time

#case `uname` in
#SunOS)
# time=/usr/bin/time ;;
#Linux)
# time=/usr/bin/time ;;
#*)
# time=time ;;
#esac

echo oldawk = $oldawk, awk = $awk, time command = $time


# an arbitrary collection of input data

cat td.1 td.1 >foo.td
sed 's/^........................//' td.1 >>foo.td
pr -m td.1 td.1 td.1 >>foo.td
pr -2 td.1 >>foo.td
cat bib >>foo.td
wc foo.td

td=foo.td
>footot

for i in $*
do
echo $i "($oldawk vs $awk)":
# ind <$i
$time $oldawk -f $i $td >foo2 2>foo2t
cat foo2t
$time $awk -f $i $td >foo1 2>foo1t
cat foo1t
cmp foo1 foo2
echo $i: >>footot
cat foo1t foo2t >>footot
done

ctimes footot
10 changes: 10 additions & 0 deletions contrib/one-true-awk/testdir/NOTES
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Need some tests for octal, hex, various string escapes.

Need to complete the sub and gsub tests.

more on printf, especially weird formats

more on operators


never throw away a test
44 changes: 44 additions & 0 deletions contrib/one-true-awk/testdir/README.TESTS
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
The archive of test files contains

- A shell file called REGRESS that controls the testing process.

- Several shell files called Compare* that control sub-parts
of the testing.

- About 160 small tests called t.* that constitute a random
sampling of awk constructions collected over the years.
Not organized, but they touch almost everything.

- About 60 small tests called p.* that come from the first
two chapters of The AWK Programming Language. This is
basic stuff -- they have to work.

These two sets are intended as regression tests, to be sure
that a new version produces the same results as a previous one.
There are a couple of standard data files used with them,
test.data and test.countries, but others would work too.

- About 20 files called T.* that are self-contained and
more systematic tests of specific language features.
For example, T.clv tests command-line variable handling.
These tests are not regressions -- they compute the right
answer by separate means, then compare the awk output.
A specific test for each new bug found shows up in at least
one of these, most often T.misc. There are about 220 tests
total in these files.

- Two of these files, T.re and T.sub, are systematic tests
of the regular expression and substitution code. They express
tests in a small language, then generate awk programs that
verify behavior.

- About 20 files called tt.* that are used as timing tests;
they use the most common awk constructions in straightforward
ways, against a large input file constructed by Compare.tt.


There is undoubtedly more stuff in the archive; it's been
collecting for years and may need pruning. Suggestions for
improvement, additional tests (especially systematic ones),
and the like are all welcome.

Loading

0 comments on commit 23f2437

Please sign in to comment.