From coff at tuhs.org  Tue Oct  7 03:39:10 2025
From: coff at tuhs.org (Douglas McIlroy via COFF)
Date: Mon, 6 Oct 2025 13:39:10 -0400
Subject: [COFF] [TUHS] Unix gre,
 forgotten successor to grep (was: forth on early unix)
In-Reply-To: <70D71E86-7484-4BB6-AF0C-2FFC1FC9B710@archibald.dev>
References: <96A17F58-C1D8-4CA6-BF2F-EABDE17DF02C@archibald.dev>
 <CAGfO01woXmf=e-XvwVq9Xfb1hCymztWRUuNOrcrk8eOE7HPvJw@mail.gmail.com>
 <CAGw7KvpCTsPy8s3mAmbnMynsposuq-rWerXn_2DMQwg4fDCiQQ@mail.gmail.com>
 <CAMP=X_=iMBARR+tRMbz_eTgNUJU9JCj9G8L6GrjxJtMcaV1L9A@mail.gmail.com>
 <70D71E86-7484-4BB6-AF0C-2FFC1FC9B710@archibald.dev>
Message-ID: <CAKH6PiU5GwAJtAK3HS3y7272-ovZ2PaS3Ls338wGvqO6hVejaQ@mail.gmail.com>

Since QED predated Unix, I'm redirecting this to COFF.

Ken's CACM article evoked an unusually harsh response in Computing
Reviews. The reviewer said roughly that everybody knows one can make a
deterministic recognizer that runs in linear time, so why waste our
time talking about an NFA recognizer? This moved me to write a letter
to the editor of CR in Ken's defense. I pointed out that a
deterministic recognizer can have  exponentially more states than the
number of symbols in the regular expression. This might well overflow
memories of the time and in any event would take exponential time to
construct and completely wipe out the advantage of linear recognition
time. (Al Aho had not yet invented the egrep algorithm, which
constructs only the states encountered during recognition.)

Computing Reviews did not have a letters section, so, as far as I
know, the off-base review still stands unrebutted in the literature.

Doug

On Mon, Oct 6, 2025 at 12:04 AM Thalia Archibald <thalia at archibald.dev> wrote:
>
> Ken,
>
> Your email reminds me of a comment you made in a 1989 interview with Mike
> Mahoney, that suggests something earlier than QED:
>
> > I did a lot of compiling. Even in college and out of college I did a lot of
> > on-the-fly compilers. Ah. ah. I wrote a GREP-like program. It would... You
> > type in …, you’d say what you wanted it to look for, and a sed-like thing
> > also. That you’d say, I want to do a substitute of A for B or some block of
> > text. What it would do is compile a program that would look for A and
> > substitute in B and then run the compiled program so that one level removed
> > from it do I direct my (unclear) and the early languages, the regular
> > expression searching stuff in ED and its predecessors on CTSS and those things
> > were in fact compilers for searches. They in fact compiled regular...
>
> https://www.tuhs.org/Archive/Documentation/OralHistory/transcripts/thompson.htm
>
> By anyone's history of regular expressions, your matcher in QED was the first
> software implementation of regular expressions. Was this grep-like program you
> wrote in college something earlier than that? Could you share more about it? Do
> you somehow still have the source for these? I'd love to study it.
>
> Thalia
>
> On Sep 23, 2025, at 11:40, Ken Thompson wrote:
> > i think the plan9 grep is the fastest.
> > it is grep, egrep, fgrep also.
> > i think it is faster than boyer-moore.
> > the whole program is a jit dfa
> >
> >   read block
> >   for c in block
> >   {
> >      s=s.state[c]
> >      if s == nil do something occasionally
> >   }
> >
> > it is a very few cycles per byte. all of the
> > time is reading a block. i cant imagine b/m
> > could be faster. the best b/m could do is
> > calculate the skip and then jump over
> > bytes that you have already read.
> >
> >
> > russ cox used it to do the (now defunct) code
> > search in google.
>
>

From coff at tuhs.org  Sat Oct 18 11:44:02 2025
From: coff at tuhs.org (steve jenkin via COFF)
Date: Sat, 18 Oct 2025 12:44:02 +1100
Subject: [COFF] [TUHS] To NDEBUG or not to NDEBUG, that is the question
In-Reply-To: <E1v9iqn-00000000fsD-41aV@gleep.local>
References: <E1v9iqn-00000000fsD-41aV@gleep.local>
Message-ID: <08014FB9-483A-4ED7-BE5B-BC06D3EA24C6@canb.auug.org.au>

This thread, responding to the original, moved to COFF, not about Early Unix.
============================================================


> On 17 Oct 2025, at 22:42, Aharon Robbins via TUHS <tuhs at tuhs.org> wrote:
> 
> Now, I can understand why assert() and NDEBUG work the way they do.

> Particularly on the small PDP-11s on which C and Unix were developed,
> it made sense to have a way to remove assertions from code that would
> be installed for all users.


How many computing workloads are now CPU limited,
and can’t afford run-time Sanity Checking in Userland?

For decades, people would try to ‘optimise’ performance 
by initially writing in assembler [ that myth dealt with by others ].

That appears to have flipped to using huge, slow Frameworks,
such as Javascript / ECMA script for ‘Applications’.

I’m not advocating “CPU is free, we can afford to forget about optimisation”.

That’s OK with prototypes and ‘run once or twice’, 
human time matters more, but not in high-volume production workloads.

The deliberate creation of bloat & wasting resources (== energy & dollars)
for production work isn’t Professional behaviour IMHO.

10-15 years ago I saw something about Google’s web server 
CPU utilisation, being 60%-70%, from memory.

It struck me that “% CPU" wasn’t a good metric for throughput anymore,
and ’system performance’ was a complex, multi-factored problem,
that had to be tuned per workload and target metric for ‘performance’.

Low-Latency is only achieved at the cost of throughput.
Google may have deliberately opted for lower %CPU to be responsive.

Around the same time, there were articles about the throughput increase
and latency improvement by some large site moving to SSD’s.
IIRC, their CPU utilisation dropped markedly as well.

Removing the burden of I/O waits causing deep scheduling queues 
somehow reduced total kernel overhead. 
Perhaps fewer VM page faults because of shorter process residency…

I’ve no data on modern Supercomputers - I’d expect there to be huge
effort in turning resources for individual applications & data sets.

There’d be real incentive at the high-end to maximise ‘performance’,
as well as at the other end: low-power & embedded systems.

I’m more talking about Commercial Off the Shelf and small- to mid-size
installations:
- the things people run every day  and suffer from slow response times.

--
Steve Jenkin, IT Systems and Design 
0412 786 915 (+61 412 786 915)
PO Box 38, Kippax ACT 2615, AUSTRALIA

mailto:sjenkin at canb.auug.org.au http://members.tip.net.au/~sjenkin


From coff at tuhs.org  Sat Oct 18 14:11:03 2025
From: coff at tuhs.org (Lars Brinkhoff via COFF)
Date: Sat, 18 Oct 2025 04:11:03 +0000
Subject: [COFF] [TUHS] To NDEBUG or not to NDEBUG, that is the question
In-Reply-To: <08014FB9-483A-4ED7-BE5B-BC06D3EA24C6@canb.auug.org.au> (steve
 jenkin via COFF's message of "Sat, 18 Oct 2025 12:44:02 +1100")
References: <E1v9iqn-00000000fsD-41aV@gleep.local>
 <08014FB9-483A-4ED7-BE5B-BC06D3EA24C6@canb.auug.org.au>
Message-ID: <7wplak3l48.fsf@junk.nocrew.org>

Steve Jenkin wrote:
> How many computing workloads are now CPU limited,
> and can’t afford run-time Sanity Checking in Userland?

At my day job we have compiled with -g -O0 from day one, and we are not
eager to change.  I suppose if the project management starts to worry
about CPU load or memory shortage, then we'll turn on the optimizer.  We
have joked about adding ballast to the application, so we can score an
easy win when someone complains it's too big.

From coff at tuhs.org  Wed Oct 22 07:44:18 2025
From: coff at tuhs.org (Warren Toomey via COFF)
Date: Wed, 22 Oct 2025 07:44:18 +1000
Subject: [COFF] B compiler for Linux/macOS
Message-ID: <aPf+spVwoVLm8jFH@minnie.tuhs.org>

Hi all, I got this e-mail from Serge. I asked and he was happy for me
to share the e-mail with you.

Cheers, Warren

----- Forwarded message from Serge Vakulenko <serge.vakulenko at gmail.com> -----

   Dear Warren,
   I hope this email finds you well. Although we've never met in person,
   my name is Serge. I'm a software developer based in the San Francisco
   Bay Area, with a deep passion for computer history.
   A few years ago, I came across the source code for the B compiler,
   which was reverse-engineered by Robert Swerczek. Intrigued by the
   challenge of adapting it for contemporary systems, I developed a
   full-featured B compiler that generates intermediate representation
   (IR) code for LLVM. This allows it to produce native binaries for Linux
   or macOS across x86_64, ARM64, and RISC-V architectures.
   The compiler itself is implemented in Go (approximately 3,000 lines of
   code), with a lightweight runtime library in C (under 400 lines). I've
   kept the API as faithful as possible to the original PDP-7
   implementation, enabling direct compilation of files like b.b without
   modifications.

   Here is the project: https://github.com/sergev/blang

   Your insightful article on restoring the PDP-7 to run Unix has always
   inspired me, so I wanted to share this project with you. It's exciting
   to think that the B language— a foundational piece of computing
   history— is now accessible to modern developers.

   Best regards,
   Serge Vakulenko
----- End forwarded message -----

From coff at tuhs.org  Mon Oct 27 15:47:07 2025
From: coff at tuhs.org (segaloco via COFF)
Date: Mon, 27 Oct 2025 05:47:07 +0000
Subject: [COFF] Some Famicom/NES Utilities in the UNIX Tradition
Message-ID: <aVwG50wzNpIrYVUDGZCOGPh3Fu8pxsGmZ_s9JCteIEWzqlxyhA-9WtoOMljqOO1Hfdaq0CgLrdcopz5bbz3qfBB5yxkTvdZi8XwKScDht7c=@protonmail.com>

For those whom this sort of thing may interest, I wanted to take
an opportunity to share some tools I've been tinkering on lately
as well as a little background about them.  The two main sets are
at:

https://gitlab.com/segaloco/misc/-/tree/master/fc_tools

and

https://gitlab.com/segaloco/smb3/-/tree/master/tools

With the former being tools general to the Famicom/NES and the
latter being tools more specific to Super Mario Bros. 3, my
disassembly of which has served as the testbed for developing
these and other tools.

I share these not only due to their general concern of fitting a
number of different needs relevant to both development and
reverse engineering of NES games, but also due to the influence
that the UNIX philosophy has had on the design patterns and
decisions I've made.

The bulk of these tools act as filters, specifically so that they
can be strung together into pipelines as is tradition.
Furthermore, where a concern matched close enough with an
existing UNIX utility, I used that utility as the interface model
for my own.  For instance, my ddnes(1) utility derives its
argument syntax directly from dd(1) and, similar to how dd(1)
abstracts disk blocks and has some basic conversions like ASCII
to EBCDIC, allows for specifying the abstract mapping scheme of
iNES images being dumped from.  This sort of replication of
familiar UNIX interfaces as significantly lowered the cognitive
load not only of remembering flags and options, but also
contemplating the logical structure of pipelined operations. I
simply need to do the same thing I would do for a more generic
data operation, except I swap in my tools where necessary.

In some ways I owe it to TUHS and the larger community
surrounding these UNIX history efforts that these tools exist at
all.  The field of video game reverse engineering is what I cut
my teeth on as a tech person, and that field has for such a long
time been dominated by the Windows world, graphical applications,
complexity and closed-source solutions, and so on.  In other
words, being a ROM hacker on weird UNIX platforms is a lonely,
relatively DIY situation compared with the same in the Microsoft
Windows ecosystem.  Learning more about UNIX and more importantly
the UNIX philosophy through discussions here, historical
preservation, studying old manuals and source code, etc. has had
an outsize influence then on my desire to produce what, in my
obviously biased opinion, is a quite comfortable development and
reverse engineering environment for the NES on UNIX.  In many
ways I was inspired by the same motivation UNIX was developed
under, to create a simpler, more intuitive technical environment
that avoids the needless complexity of many of the more
established toolkits and workflows.

Only time will tell if my tools see uptake in the niche
communities they concern, but I felt some appreciation was in
order for the fact that UNIX impacted my development of this
toolkit on so many levels.

- Matt G.

P.S. If you're someone who tinkers with this sort of stuff and
have questions or suggestions, I'm always happy to discuss the
finer points.  Licenses are provided with the usual disclaimers,
know that there isn't much error checking, so don't feed these
bad data and redirect output into a precious file without a
backup plan.  You have been warned these are not hardened for
production workloads.

From coff at tuhs.org  Mon Oct 27 21:10:20 2025
From: coff at tuhs.org (=?utf-8?q?Cameron_M=C3=AD=C4=8Be=C3=A1l_Tyre_via_COFF?=)
Date: Mon, 27 Oct 2025 11:10:20 +0000
Subject: [COFF] Some Famicom/NES Utilities in the UNIX Tradition
In-Reply-To: <aVwG50wzNpIrYVUDGZCOGPh3Fu8pxsGmZ_s9JCteIEWzqlxyhA-9WtoOMljqOO1Hfdaq0CgLrdcopz5bbz3qfBB5yxkTvdZi8XwKScDht7c=@protonmail.com>
References: <aVwG50wzNpIrYVUDGZCOGPh3Fu8pxsGmZ_s9JCteIEWzqlxyhA-9WtoOMljqOO1Hfdaq0CgLrdcopz5bbz3qfBB5yxkTvdZi8XwKScDht7c=@protonmail.com>
Message-ID: <XVD7_YVktaXRofNAU1UfeO4GKBlFrIrpAeYRT9vFla0DrGpfjqlEeHSTGjk5cCSRe0HLzniRlDW4ywchTZpNZJxLjZrAmx4_Ki-20sadlDE=@protonmail.ch>

Hi Matt,

Sounds awesome. Wish I'd had tools like that 40+ years ago when I used to reverse engineer Z80 machine code in commercial games to figure out cheats!

I used to poke a short routine into unused memory and run that to scan through the game code, searching for likely things such as lives remaining, lose a life etc.

It's great to know that people like you are still doing stuff like that.

Best regards,

Cameron