awk: Merge 20210729 from One True Awk upstream (0592de4a)

July 27, 2021: As per IEEE Std 1003.1-2008, -F "str" is now consistent with -v FS="str" when str is null. Thanks to Warner Losh. July 24, 2021: Fix readrec's definition of a record. This fixes an issue with NetBSD's RS regular expression support that can cause an infinite read loop. Thanks to Miguel Pineiro Jr. Fix regular expression RS ^-anchoring. RS ^-anchoring needs to know if it is reading the first record of a file. This change restores a missing line that was overlooked when porting NetBSD's RS regex functionality. Thanks to Miguel Pineiro Jr. Fix size computation in replace_repeat() for special case REPEAT_WITH_Q. Thanks to Todd C. Miller. Also, included the tests from upstream, though they aren't yet connected to the tree. Sponsored by: Netflix
nhuff · Aug 1, 2021 · 23f2437 · 23f2437
2 parents a61c24d + f9002b8
commit 23f2437
Show file tree

Hide file tree

Showing 330 changed files with 71,810 additions and 10 deletions.
diff --git a/contrib/one-true-awk/FIXES b/contrib/one-true-awk/FIXES
@@ -25,6 +25,23 @@ THIS SOFTWARE.
 This file lists all bug fixes, changes, etc., made since the AWK book
 was sent to the printers in August, 1987.
 
+July 27, 2021:
+	As per IEEE Std 1003.1-2008, -F "str" is now consistent with
+	-v FS="str" when str is null. Thanks to Warner Losh.
+
+July 24, 2021:
+	Fix readrec's definition of a record. This fixes an issue
+	with NetBSD's RS regular expression support that can cause
+	an infinite read loop. Thanks to Miguel Pineiro Jr.
+
+	Fix regular expression RS ^-anchoring. RS ^-anchoring needs to
+	know if it is reading the first record of a file. This change
+	restores a missing line that was overlooked when porting NetBSD's
+	RS regex functionality. Thanks to Miguel Pineiro Jr.
+
+	Fix size computation in replace_repeat() for special case
+	REPEAT_WITH_Q. Thanks to Todd C. Miller.
+
 February 15, 2021:
 	Small fix so that awk will compile again with g++. Thanks to
 	Arnold Robbins.

diff --git a/contrib/one-true-awk/README.md b/contrib/one-true-awk/README.md
@@ -107,13 +107,17 @@ astonishly slow.  If `awk` seems slow, you might try fixing that.
 More generally, turning on optimization can significantly improve
 `awk`'s speed, perhaps by 1/3 for highest levels.
 
+## A Note About Releases
+
+We don't do releases. 
+
 ## A Note About Maintenance
 
-NOTICE! Maintenance of this program is on a ``best effort''
+NOTICE! Maintenance of this program is on a ''best effort''
 basis.  We try to get to issues and pull requests as quickly
 as we can.  Unfortunately, however, keeping this program going
 is not at the top of our priority list.
 
 #### Last Updated
 
-Fri Dec 25 16:53:34 EST 2020
+Sat Jul 25 14:00:07 EDT 2021
diff --git a/contrib/one-true-awk/TODO b/contrib/one-true-awk/TODO
@@ -0,0 +1,19 @@
+Wed Jan 22 02:10:35 MST 2020
+============================
+
+Here are some things that it'd be nice to have volunteer
+help on.
+
+1. Rework the test suite so that it's easier to maintain
+and see exactly which tests fail:
+	A. Extract beebe.tar into separate file and update scripts
+	B. Split apart multiple tests into separate tests with input
+	   and "ok" files for comparisons.
+
+2. Pull in more of the tests from gawk that only test standard features.
+   The beebe.tar file appears to be from sometime in the 1990s.
+
+3. Make the One True Awk valgrind clean. In particular add a
+   a test suite target that runs valgrind on all the tests and
+   reports if there are any definite losses or any invalid reads
+   or writes (similar to gawk's test of this nature).
diff --git a/contrib/one-true-awk/b.c b/contrib/one-true-awk/b.c
@@ -935,7 +935,7 @@ replace_repeat(const uschar *reptok, int reptoklen, const uschar *atom,
 	if (special_case == REPEAT_PLUS_APPENDED) {
 		size++;		/* for the final + */
 	} else if (special_case == REPEAT_WITH_Q) {
-		size += init_q + (atomlen+1)* n_q_reps;
+		size += init_q + (atomlen+1)* (n_q_reps-init_q);
 	} else if (special_case == REPEAT_ZERO) {
 		size += 2;	/* just a null ERE: () */
 	}
@@ -964,11 +964,8 @@ replace_repeat(const uschar *reptok, int reptoklen, const uschar *atom,
 		}
 	}
 	memcpy(&buf[j], reptok+reptoklen, suffix_length);
-	if (special_case == REPEAT_ZERO) {
-		buf[j+suffix_length] = '\0';
-	} else {
-		buf[size] = '\0';
-	}
+	j += suffix_length;
+	buf[j] = '\0';
 	/* free old basestr */
 	if (firstbasestr != basestr) {
 		if (basestr)

diff --git a/contrib/one-true-awk/bugs-fixed/REGRESS b/contrib/one-true-awk/bugs-fixed/REGRESS
@@ -0,0 +1,28 @@
+#! /bin/bash
+
+if [ ! -f ../a.out ]
+then
+	echo Making executable
+	(cd .. ; make) || exit 0
+fi
+
+for i in *.awk
+do
+	echo === $i
+	OUT=${i%.awk}.OUT
+	OK=${i%.awk}.ok
+	IN=${i%.awk}.in
+	input=
+	if [ -f $IN ]
+	then
+		input=$IN
+	fi
+
+	../a.out -f $i $input > $OUT 2>&1
+	if cmp -s $OK $OUT
+	then
+		rm -f $OUT
+	else
+		echo ++++ $i failed!
+	fi
+done
diff --git a/contrib/one-true-awk/bugs-fixed/fs-overflow.ok b/contrib/one-true-awk/bugs-fixed/fs-overflow.ok
@@ -0,0 +1 @@
+foo
diff --git a/contrib/one-true-awk/bugs-fixed/inf-nan-torture.awk b/contrib/one-true-awk/bugs-fixed/inf-nan-torture.awk
@@ -0,0 +1,4 @@
+{
+	for (i = 1; i <= NF; i++)
+		print i, $i, $i + 0
+}
diff --git a/contrib/one-true-awk/bugs-fixed/inf-nan-torture.in b/contrib/one-true-awk/bugs-fixed/inf-nan-torture.in
@@ -0,0 +1 @@
+-inf -inform inform -nan -nancy nancy -123 0 123 +123 nancy +nancy +nan inform +inform +inf
diff --git a/contrib/one-true-awk/bugs-fixed/inf-nan-torture.ok b/contrib/one-true-awk/bugs-fixed/inf-nan-torture.ok
@@ -0,0 +1,16 @@
+1 -inf -inf
+2 -inform 0
+3 inform 0
+4 -nan -nan
+5 -nancy 0
+6 nancy 0
+7 -123 -123
+8 0 0
+9 123 123
+10 +123 123
+11 nancy 0
+12 +nancy 0
+13 +nan +nan
+14 inform 0
+15 +inform 0
+16 +inf +inf
diff --git a/contrib/one-true-awk/bugs-fixed/pfile-overflow.awk b/contrib/one-true-awk/bugs-fixed/pfile-overflow.awk
@@ -0,0 +1 @@
+\
diff --git a/contrib/one-true-awk/bugs-fixed/pfile-overflow.ok b/contrib/one-true-awk/bugs-fixed/pfile-overflow.ok
@@ -0,0 +1,4 @@
+../a.out: syntax error at source line 1 source file pfile-overflow.awk
+ context is
+	 >>>  <<< 
+../a.out: bailing out at source line 1 source file pfile-overflow.awk
diff --git a/contrib/one-true-awk/bugs-fixed/rs_underflow.awk b/contrib/one-true-awk/bugs-fixed/rs_underflow.awk
@@ -0,0 +1 @@
+BEGIN { RS="zx" } { print $1 }
diff --git a/contrib/one-true-awk/bugs-fixed/rs_underflow.in b/contrib/one-true-awk/bugs-fixed/rs_underflow.in
@@ -0,0 +1 @@
+�
diff --git a/contrib/one-true-awk/bugs-fixed/rs_underflow.ok b/contrib/one-true-awk/bugs-fixed/rs_underflow.ok
@@ -0,0 +1 @@
+�
diff --git a/contrib/one-true-awk/lib.c b/contrib/one-true-awk/lib.c
@@ -176,6 +176,7 @@ int getrec(char **pbuf, int *pbufsize, bool isrecord)	/* get next input record *
 				infile = stdin;
 			else if ((infile = fopen(file, "r")) == NULL)
 				FATAL("can't open file %s", file);
+			innew = true;
 			setfval(fnrloc, 0.0);
 		}
 		c = readrec(&buf, &bufsize, infile, innew);
@@ -241,6 +242,7 @@ int readrec(char **pbuf, int *pbufsize, FILE *inf, bool newflag)	/* read one rec
 		}
 		if (found)
 			setptr(patbeg, '\0');
+		isrec = (found == 0 && *buf == '\0') ? false : true;
 	} else {
 		if ((sep = *rs) == 0) {
 			sep = '\n';
@@ -270,10 +272,10 @@ int readrec(char **pbuf, int *pbufsize, FILE *inf, bool newflag)	/* read one rec
 		if (!adjbuf(&buf, &bufsize, 1+rr-buf, recsize, &rr, "readrec 3"))
 			FATAL("input record `%.30s...' too long", buf);
 		*rr = 0;
+		isrec = (c == EOF && rr == buf) ? false : true;
 	}
 	*pbuf = buf;
 	*pbufsize = bufsize;
-	isrec = *buf || !feof(inf);
 	DPRINTF("readrec saw <%s>, returns %d\n", buf, isrec);
 	return isrec;
 }

diff --git a/contrib/one-true-awk/main.c b/contrib/one-true-awk/main.c
@@ -22,7 +22,7 @@ ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
 THIS SOFTWARE.
 ****************************************************************/
 
-const char	*version = "version 20210215";
+const char	*version = "version 20210724";
 
 #define DEBUG
 #include <stdio.h>

diff --git a/contrib/one-true-awk/testdir/Compare.T1 b/contrib/one-true-awk/testdir/Compare.T1
@@ -0,0 +1,10 @@
+
+oldawk=${oldawk-awk}
+awk=${awk-../a.out}
+
+echo oldawk=$oldawk, awk=$awk
+
+for i in T.*
+do 
+	$i
+done
diff --git a/contrib/one-true-awk/testdir/Compare.drek b/contrib/one-true-awk/testdir/Compare.drek
@@ -0,0 +1,35 @@
+# an arbitrary collection of input data
+
+cat td.1 td.1 >foo.td
+sed 's/^........................//' td.1 >>foo.td
+pr -m td.1 td.1 td.1 >>foo.td
+pr -2 td.1 >>foo.td
+wc foo.td
+
+td=foo.td
+>footot
+
+for i in $*
+do
+	echo $i >/dev/tty
+	echo $i '<<<'
+	cd ..
+	echo testdir/$i:
+	ind <testdir/$i
+	a.out -f testdir/$i >drek.c
+	cat drek.c
+	make drek || ( echo $i '	' bad compile; echo $i '	' bad compile >/dev/tty; continue )
+	cd testdir
+
+	time /usr/bin/awk -f $i $td >foo2 2>foo2t
+	cat foo2t
+	time ../drek $td >foo1 2>foo1t
+	cat foo1t
+	cmp foo1 foo2 || ( echo $i '	' bad; echo $i '	' bad >/dev/tty; diff foo1 foo2 | sed 20q )
+	echo '>>>' $i
+	echo
+	echo $i: >>footot
+	cat foo1t foo2t >>footot
+done
+
+ctimes footot
diff --git a/contrib/one-true-awk/testdir/Compare.p b/contrib/one-true-awk/testdir/Compare.p
@@ -0,0 +1,17 @@
+
+oldawk=${oldawk-awk}
+awk=${awk-../a.out}
+
+echo oldawk=$oldawk, awk=$awk
+
+for i
+do
+	echo "$i:"
+	$oldawk -f $i test.countries test.countries >foo1 
+	$awk -f $i test.countries test.countries >foo2 
+	if cmp -s foo1 foo2
+	then true
+	else echo -n "$i:	BAD ..."
+	fi
+	diff -b foo1 foo2 | sed -e 's/^/	/' -e 10q
+done
diff --git a/contrib/one-true-awk/testdir/Compare.t b/contrib/one-true-awk/testdir/Compare.t
@@ -0,0 +1,17 @@
+
+oldawk=${oldawk-myawk}
+awk=${awk-../a.out}
+
+echo oldawk=$oldawk, awk=$awk
+
+for i
+do
+	echo "$i:"
+	$oldawk -f $i test.data >foo1 
+	$awk -f $i test.data >foo2 
+	if cmp -s foo1 foo2
+	then true
+	else echo -n "$i:	BAD ..."
+	fi
+	diff -b foo1 foo2 | sed -e 's/^/	/' -e 10q
+done
diff --git a/contrib/one-true-awk/testdir/Compare.tt b/contrib/one-true-awk/testdir/Compare.tt
@@ -0,0 +1,49 @@
+#!/bin/sh
+
+oldawk=${oldawk-awk}
+awk=${awk-../a.out}
+
+echo compiling time.c
+gcc time.c -o time
+time=./time
+
+echo time command = $time
+
+#case `uname` in
+#SunOS)
+#	time=/usr/bin/time ;;
+#Linux)
+#	time=/usr/bin/time ;;
+#*)
+#	time=time ;;
+#esac
+
+echo oldawk = $oldawk, awk = $awk, time command = $time
+
+
+# an arbitrary collection of input data
+
+cat td.1 td.1 >foo.td
+sed 's/^........................//' td.1 >>foo.td
+pr -m td.1 td.1 td.1 >>foo.td
+pr -2 td.1 >>foo.td
+cat bib >>foo.td
+wc foo.td
+
+td=foo.td
+>footot
+
+for i in $*
+do
+	echo $i "($oldawk vs $awk)":
+	# ind <$i
+	$time $oldawk -f $i $td >foo2 2>foo2t
+	cat foo2t
+	$time $awk -f $i $td >foo1 2>foo1t
+	cat foo1t
+	cmp foo1 foo2
+	echo $i: >>footot
+	cat foo1t foo2t >>footot
+done
+
+ctimes footot
diff --git a/contrib/one-true-awk/testdir/NOTES b/contrib/one-true-awk/testdir/NOTES
@@ -0,0 +1,10 @@
+Need some tests for octal, hex, various string escapes.
+
+Need to complete the sub and gsub tests.
+
+more on printf, especially weird formats
+
+more on operators
+
+
+never throw away a test
diff --git a/contrib/one-true-awk/testdir/README.TESTS b/contrib/one-true-awk/testdir/README.TESTS
@@ -0,0 +1,44 @@
+The archive of test files contains 
+
+- A shell file called REGRESS that controls the testing process.
+
+- Several shell files called Compare* that control sub-parts
+of the testing.
+
+- About 160 small tests called t.* that constitute a random
+sampling of awk constructions collected over the years.
+Not organized, but they touch almost everything.
+
+- About 60 small tests called p.* that come from the first
+two chapters of The AWK Programming Language.  This is
+basic stuff -- they have to work.
+
+These two sets are intended as regression tests, to be sure
+that a new version produces the same results as a previous one.
+There are a couple of standard data files used with them,
+test.data and test.countries, but others would work too.
+
+- About 20 files called T.* that are self-contained and
+more systematic tests of specific language features.
+For example, T.clv tests command-line variable handling.
+These tests are not regressions -- they compute the right
+answer by separate means, then compare the awk output.
+A specific test for each new bug found shows up in at least
+one of these, most often T.misc.  There are about 220 tests
+total in these files.
+
+- Two of these files, T.re and T.sub, are systematic tests
+of the regular expression and substitution code.  They express
+tests in a small language, then generate awk programs that
+verify behavior.
+
+- About 20 files called tt.* that are used as timing tests;
+they use the most common awk constructions in straightforward
+ways, against a large input file constructed by Compare.tt.
+
+
+There is undoubtedly more stuff in the archive;  it's been
+collecting for years and may need pruning.  Suggestions for
+improvement, additional tests (especially systematic ones),
+and the like are all welcome.
+
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		-inf -inform inform -nan -nancy nancy -123 0 123 +123 nancy +nancy +nan inform +inform +inf