## Recursion is my friend.

I went back to the recursive version of "drop" in value.c. The test/check now runs about 2% faster.

Formerly I was aiming to eliminate all recursive C routines in Fexl, in line with my goal to produce a "JPL-compliant" version of Fexl. I am now shelving that goal, at least for now.

Eliminating recursion requires simulating the system stack, but that makes the code more complex and slower. There is no way doing my own stack can compete with the built-in system stack.

The non-recursive drop routine was acceptably fast and not terribly complex, but things got worse when I tried removing recursion from the "subst" routine in type_sym.c. I got it to work by translating substitution patterns into linear vectors of opcodes, but it was much more complex and much slower.

That was bad enough, but imagine removing recursion from "eval" and the parser. No thank you. Recursion is my friend.

## Outline of non-recursive version of subst in type_sym.c

I have been working on a "JPL-compliant" version of Fexl. The JPL standard forbids the use of recursive functions in C. Currently I use this lambda substitution routine in type_sym.c, which is recursive:

```// Use pattern p to make a copy of expression e with argument x substituted in
// the places designated by the pattern.

static value subst(value p, value e, value x)
{
if (p == QF) return hold(e);
if (p == QT) return hold(x);
return V(e->T,subst(p->L,e->L,x),subst(p->R,e->R,x));
}
```

The trick is to make that routine non-recursive to be in compliance with JPL. Currently I represent patterns as trees, e.g.:

```A(A(hold(QF),hold(QT)),A(hold(QT),hold(QF)))
```

In the non-recursive version of subst, I will represent patterns as a linear series of op codes, where each op code is of type:

```typedef void (*opcode)(void);
```

For the sake of this discussion I represent each op code using the letters "ATF.", but I don't actually use letters in the implementation. So an example pattern might look like:

```"AAFT.ATF.."
```

When the parser builds a lambda pattern, it will allocate a fixed work area to be used for substitution with that pattern at run time. The size of this work area is known in advance based on the op codes in the pattern.

Within the work area I use two stacks, and input stack (si) and an output stack (so). However, I combine the two stacks into a single array (work), with (so) followed by (si).

When the subst routine is called, it follows the op codes in the pattern from left to right, and each op code does a very simple operation in the (work) array. The routine is branchless, with the sole exception of checking loop termination.

To determine the requisite size of the total work area (work), and the embedded stacks (so) and (si) within that work area, I have analyzed the "highwater" behavior of any valid sequence of op codes.

Let nA, nT, nF be the number of 'A', 'T', 'F' op codes in a pattern. Then the minimum sizes of the (so) and (si) stacks are:

```size_so = nT + nF
size_si = 1 + 2*nA
```

For example with p = "AAFT.ATF.." we have:

```nA = 3
nT = 2
nF = 2
```
So in that case:
```size_so = 4
size_si = 7
```
That means I can allocate a single array (work) of size (size_so + size_si) and define logical offsets into that array for the output and input stacks:
```so = work + 0;        // output stack, size size_so
si = work + size_so;  // input stack,  size size_si
```

Note by the way that the total size (size_so + size_si) is equal to 1 plus the number of op codes in the pattern.

In practice I don't actually define the (so) and (si) stack pointers. Intead I use two index variables into the main (work) array:

```unsigned int no = 0;       // offset of output stack
unsigned int ni = size_so; // offset of input stack
```

By incorporating the offset size_so into the index ni, I don't need (so) and (si), and all the operations just do things in the (work) array.

## Operation

Here is the test harness I created for the new subst routine, in a file called "test_pattern.c".

```#include <value.h>

#include <basic.h>
#include <stddef.h>
#include <memory.h>
#include <output.h>
#include <show.h>
#include <string.h>
#include <test_pattern.h>
#include <type_num.h>

// Lambda substitution engine

typedef void (*opcode)(void);

struct pattern
{
unsigned int size;    // size of the ops and work vectors
unsigned int size_so; // size of the output stack within work
opcode *ops;
value *work;
};

static value *work;     // current work area
static unsigned int no; // offset of output stack
static unsigned int ni; // offset of input stack
static value curr_x;    // current subject value x to be substituted

// Push right and left sides of top input.
static void op_A(void)
{
value E = work[ni];
work[ni+1] = E->R;
work[ni+2] = E->L;
ni += 2;
}

// Keep the current x.
static void op_T(void)
{
work[no] = hold(curr_x);
ni--;
no++;
}

// Keep the top input.
static void op_F(void)
{
work[no] = hold(work[ni]);
ni--;
no++;
}

// Combine left and right outputs into new output.
static void op_dot(void)
{
value E = work[ni];
value L = work[no-2];
value R = work[no-1];
work[no-2] = V(E->T,L,R);
ni--;
no--;
}

// Use pattern p to make a copy of expression e with argument x substituted in
// the places designated by the pattern.

static value subst(struct pattern *p, value e, value x)
{
work = p->work;
no = 0;
ni = p->size_so;
curr_x = x;

work[ni] = e;

// I use a do loop because I know the first op code cannot be 0.
{
opcode *ops = p->ops;
opcode op = *ops;

do
{
op();
op = *(++ops);
}
while (op);
}

return work[0];
}

// The following functions are only for testing patterns expressed as strings.
// They won't be in the production code.

static unsigned int map_chars_to_ops(const char *s_ops, opcode *ops)
{
unsigned int size_so = 0;
while (1)
{
switch (*(s_ops++))
{
case 'A':
*(ops++) = op_A;
continue;
case 'T':
*(ops++) = op_T;
size_so++;
continue;
case 'F':
*(ops++) = op_F;
size_so++;
continue;
case '.':
*(ops++) = op_dot;
continue;
}
*ops = 0; // sentinel
return size_so;
}
}

// Assumes the s_ops string is properly formed.
static struct pattern *string_to_pattern(const char *s_ops)
{
size_t size = 1 + strlen(s_ops);
value *work = new_memory(size * sizeof(value));
struct pattern *p = new_memory(sizeof(struct pattern));
opcode *ops = new_memory(size * sizeof(opcode));
unsigned int size_so = map_chars_to_ops(s_ops,ops);

*p = (struct pattern){ size, size_so, ops, work };
return p;
}

static void free_pattern(struct pattern *p)
{
free_memory(p->ops, p->size * sizeof(opcode));
free_memory(p->work, p->size * sizeof(value));
free_memory(p, sizeof(struct pattern));
}

static void try_pattern(const char *s_ops, value e, value x)
{
struct pattern *p = string_to_pattern(s_ops);
value r = subst(p,e,x);

put("p = ");put(s_ops);nl();
show_line("e = ",e);
show_line("x = ",x);
show_line("r = ",r);
nl();

free_pattern(p);
drop(r);
drop(e);
drop(x);
}

void test_patterns(void)
{
try_pattern
(
"T",
hold(QI),
Qnum(3)
);
try_pattern
(
"F",
hold(QI),
Qnum(3)
);
try_pattern
(
"AFT.",
A(hold(QI),hold(QI)),
Qnum(3)
);
try_pattern
(
"ATF.",
A(hold(QI),hold(QI)),
Qnum(3)
);
try_pattern
(
"AAFT.ATF..",
A(A(hold(QI),hold(QI)),V(type_I,hold(QI),hold(QI))),
Qnum(3)
);
}
```

The output is:

```p = T
e = [I]
x = [num 3]
r = [num 3]

p = F
e = [I]
x = [num 3]
r = [I]

p = AFT.
e = [A [I] [I]]
x = [num 3]
r = [A [I] [num 3]]

p = ATF.
e = [A [I] [I]]
x = [num 3]
r = [A [num 3] [I]]

p = AAFT.ATF..
e = [A [A [I] [I]] [I [I] [I]]]
x = [num 3]
r = [A [A [I] [num 3]] [I [num 3] [I]]]

```

## Efficiency

The use of opcodes generates a very tight loop in the subst routine. Consider the C code that loops through the op codes:

```    do
{
op();
op = *(++ops);
}
while (op);
```

In the assembler output, that loop becomes:

```    jmp .L18
.p2align 4,,10
.p2align 3
.L23:
.L18:
call    *%rax
movq    (%rbx), %rax
testq   %rax, %rax
jne .L23

```

That was assembled with:

```gcc -S -I . -O3 test_pattern.c
```

## Idea: Generating custom ASM code on the fly

At some point I could generate custom ASM code for any given pattern and jump right into it with a single return address in a register. But that has platform dependencies, and I should start with the op code technique anyway.

## Pure Functions Sometimes Considered Harmful

In the Fexl project I have a couple of tests b15 and index_C which render output with nested indentation. I was using a "pure functional" style, but not anymore. I just published this commit of the Fexl project which eliminates that complexity, simplifying the code by using stateful output routines.

That eliminates the "pure functional" output style, which took great pains to build a nested list structure that was only sent to the output device at the very end. This required passing around a hidden "tab" parameter in a monadic fashion, the use of a "next" continuation parameter, and semicolons to ensure that the tail of a structure was threaded in properly.

Now when you call routines like say, nl, put, and indent, the effects of those calls are going directly into a file handle or other target as they run. You are no longer building up a lazy nested piece of data which is later sent to an output target. All the complexity of the "pure" method vanishes like a bad dream.

This drastically simplifies the code supporting the b15 and index_C tests. I also use the technique to great advantage in the demo.ray project.

This illustrates a principle which I title:

"Pure Functions Sometimes Considered Harmful"

That's clearly a riff on Dijkstra's famous Go To Statement Considered Harmful. I consider the use of pure functions as sometimes harmful.

The pure functional style still has its place in certain aspects of a program, but for input and output it's easier just to cut to the chase and deal with the machine on its own terms. In those cases, "var" is your friend — and it can even create more opportunities for implementing routines directly in C, as I did with the parsing primitives in stream.c in the Fexl project.

I am also finding the var technique useful for "throughput". In one major project I was passing a single immutable state parameter around everywhere, and I greatly simplified the code by keeping that state in a handful of vars.

The good thing about "var" is that it is controlled mutability, not the haphazard and dangerous kind where you modify nested data structures in place. In Fexl the data structures are still immutable, but you use a var to point to a new version when you make a state change. I think this strikes the right balance between pure functions and routines with side effects.

## Running Fexl on Windows

Every once in a while I've wondered what it would take to port Fexl to Windows. I did consider doing a completely native implementation — after all, I have written native C code on Windows before, though that was back in the late 90s. I'm not thrilled with the development environment compared to Linux, but I could probably manage something and start slogging through it.

The Linux version includes features that use all sorts of Linux system calls such as usleep, exec, dup, fopen, setrlimit, fork, dup, pipe, and many others. I would like the Windows version to behave exactly like the Linux version, or as close as possible.

Now consider just the fork and exec calls alone. I could probably port that using CreateProcess on Windows, but there could be subtle semantic differences. Then how would I replicate the behavior of all those other system calls, even approximately? For example, What is the equivalent of setrlimit, and how "equivalent" is it, really?

I did experiment with Cygwin, and it's pretty interesting. I was able to build Fexl on Windows under Cygwin, and the checks pretty much passed, with the exception of things like test/stats.fxl which call the rand function to get some weakly random numbers. I mean I do specify a fixed seed, but the generated sequence is still different because I suppose Cygwin uses different code for rand.

So it's a little sloppy under Cygwin, but oh well. Then there's the issue of getting Fexl to run outside the Cygwin terminal. They have you bind your application with a special Cygwin DLL. I never quite got that far because I had other priorities.

Today I was using a Windows machine for completely unrelated reasons and I decided to revisit the issue of porting Fexl to Windows. As I contemplated the nature of computing, I was struck with the obvious fact that a Linux machine and a Windows machine are essentially both just physical devices with patterns of electrical fields fluctuating rapidly within them. Although the hardware is virtually identical, those patterns make all the difference in their outward behavior — and in some cases, misbehavior.

All that philosophical musing suggested one solution: use a virtual machine. So I started looking at VirtualBox (in "seamless" mode) and VMware (in "unity" mode). I started going through the installation of VirtualBox, and it required C++ 2019. So I started to install Visual Studio, and as it downloaded over 3 GB of data, I perused the internet for other articles on virtual machines.

That's when I stumbled on a little product from Microsoft called "Windows Subsystem for Linux", and that's when I found a solution for porting Fexl to Windows.

### WSL — Windows Subsystem for Linux

Windows Subsystem for Linux ("wsl") is a highly integrated virtual machine which supports multiple Linux style distributions running on Windows. Installing it is roughly just running this command in the Power Shell, and restarting the computer:
```wsl --install
```
After that, you'll find a new app called "Ubuntu" which you can search in the Start menu and pin to the task bar. You bring up an Ubuntu window, and you're in bona fide Linux shell. You can use vim, you can install the gcc compiler with "apt get install" -- the works. It is Linux running on the machine.

If you have any troubles, you probably just need to "Enable Virtualization" in the BIOS, using the standard F1/F2/Del/Alt-F4/whatever trick during bootup to get to that option screen. But on my machine I found that Enable Virtualization was enabled by default.

Well then I was off to the races. I "git cloned" the Fexl code and the tests ran perfectly. I even cloned a graphics application couple of us are developing which uses Fexl to generate C code that uses the raylib library to render a 3D graphic scene.

Everything "just worked," to the point where I felt just as productive developing and testing on the Windows machine as I did on my Linux machine. At times it felt surreal, but why should it? After all, it's all the same basic hardware, it's just a matter of getting the right electrical field patterns pulsing through them. Easier said than done: this took a lot of hard and brilliant work from the people at Microsoft, and I'm grateful.

### wsl.exe

So then I thought, fine, I can run these Fexl programs from within the Ubuntu window, but is there any way I could make something you could double-click from the Windows desktop, or click in the Windows start menu, that would bring up one of these programs without having to go into the Linux environment?

Well of course there is, using the little marvel called called "wsl.exe"! That is a Windows executable that can run a Linux command within a WSL virtual machine. You can run it from the ordinary Windows command prompt, or from a batch script. For example to run that graphics demo I mentioned above, I ran this from the command prompt:

```wsl.exe ~/code/fexl/bin/fexl project/demo.ray/src/demo_9457.fxl
```

I also wrapped that up in a batch file "demo_9457.bat" so I could run it easily.

There's a lot of hot development happening on Linux these days, and I think WSL provides a seamless way to take advantage of that on Windows machines. It might even reduce the motivation for end users and even developers to switch fully to Linux. I still prefer my System76 laptop running native PopOS, but I felt right at home using Linux under Windows, courtesy of WSL.

## Compiling key-value pairs into C code

I wrote some code that “compiles” a list of key-value pairs into C code. The keys and values must be strings, and I don’t yet do anything to detect special characters which must be escaped, or multi-line strings.

Here’s a test suite that tries a few different lists:

Here’s the reference output for that:

Here’s the Fexl code which does the actual code generation work:

The Fexl function compile_pairs generates a C function called "lookup". The lookup function takes a pointer to the key chars and a separate length. If you have NUL terminated data you can call strlen before calling lookup.

As far as I can tell the generated code is “optimal” in the sense that it does the fewest number of individual character comparisons needed to reach a point at which strncmp yields a definitive answer.

By the way, I could probably use “switch” instead of successive comparisons of the same key character, but I figure the optimizer will already have that covered.

## Functional records

I added a data type for "records." A record behaves like a function that maps string keys to arbitrary values. It is represented internally as a vector of zero or more items, each item a {key val} pair, sorted by key.

The expression (empty) returns the empty record.

The expression (set key val obj) sets key to val in record obj, returning a record like obj but with key mapped to val. It modifies obj inline if there are no other references to it; otherwise it returns a modified copy of obj.

The expression (record_count obj) returns the number of items in the record.

The expression (record_item obj pos) returns the item in record obj at offset pos, starting at zero.

These functions can be used to define record_pairs, which returns the lazy list of pairs in a record, allowing iteration (see test/record.fxl).

I added a chain function, defined by type_chain in basic.c. The expression (chain a b x) returns (a x) if that value is defined, otherwise (b x). This is used to establish default values when a key is not defined in a record or other function.

## Examples

This constructs a simple record value:

```(
set "a" 1;
set "b" 2;
set "c" 3;
set "d" 4;
set "e" 5;
empty
)
```

This overrides some of the earlier specified values:

```(
set "c" 33; # maps "c" to 33 instead of 3
set "b" 22; # maps "b" to 22 instead of 2

set "a" 1;
set "b" 2;
set "c" 3;
set "d" 4;
set "e" 5;
empty
)
```

Here is the commit on GitHub. See test/record.fxl for more examples.

## Extending the standard context

Lately I've been using this function in cases where I want to redefine the standard context std:

```\extend==
(\cx\form
\cx=cx
value;
(@\std\form cx; def "std" std; form);
form
)
```

Example:

```\flintstones=(value; std; use "flintstones.fxl")

extend
(\form
std;
flintstones;
form
) \;

# At this point "std" has been redefined to include the flintstones context,
# exactly as if its definitions were built in.

fred
wilma
```

However, there may be cases where I want to use another module but I don't want that module to see the new definitions. In that case I could keep a handle to the original std context, by giving it a name such as cx_base.

```\flintstones=(value; std; use "flintstones.fxl")

extend
(\form
std;
flintstones;
def "cx_base" std;
form
) \;

# Now I can choose which module can see the new flintstones definitions:

\A=(value; cx_base; use "A.fxl")  # Cannot see flintstones
\B=(value; std; use "B.fxl")      # Can see flintstones
```

That can be particularly useful if the new definitions include some potentially dangerous functions and I don't always want them available.

I will soon include extend in the standard library.

## Modular arithmetic

I wrote this piece on the subject of Modular arithmetic.

## Dynamic Libraries

I added new Fexl functions dlopen and dlsym to read dynamic libraries.

It is now possible for a Fexl program to create and run C code on the fly. To do this it would write the code into a file, compile it into a shared library, load the library with dlopen, grab the new functions with dlsym, and run them.

Here is the commit on GitHub.

## Opinionated Programming

This morning I wrote this piece on the subject of Opinionated Programming.

## Template Expansion Function

I wrote this piece on the subject of Template Expansion in Fexl.

## More on functional components

I added a section "Elaborating on the process" to the article on functional components.

## Restored fexl0 feature

I released version 26.12.0 this morning, which restores the "fexl0" feature to bypass the built-in src/main.fxl library. The comment on the eval_script routine in type_standard.c now reads as follows:

Evaluate the user's script. Read the script from the file named by argv[1] if present, or from stdin otherwise.

If this program is running as "fexl0", it runs your script directly. If it is running as "fexl", it first runs the local script "src/main.fxl", which then runs your script within an extended library context.

The purpose of "fexl0" is to give you the option of bypassing src/main.fxl altogether, defining any extensions beyond the built-in C standard context as you please.

In a future release I will make fexl0 the default behavior, so it always reads your script directly without loading any default library. Predefined libraries such as the current main.fxl will be provided in the lib directory so you can pull them in with a one-liner.

I am making this change because the one-size-fits-all approach to a standard library is a losing proposition. Eventually I might even factor out the currently built-in C functions into independent libraries loaded on demand. Then you could build an entirely separate C library, put it in your own directory somewhere, and load it from within Fexl. That would even give you the power to generate binary code on the fly and run it immediately.

## Some notes on Fexl

Today I posted this comment to a Magmide discussion list. Here's what I wrote.

You might want to check out some of the concepts in my functional programming language Fexl. I have been using it almost exclusively for years in my business, which is accounting for investment partnerships.

Fexl is not statically typed. I figured if I wanted to build a static type system, I would do that in Fexl itself, which would start off untyped. So far I have not been inclined to do static typing. Good systematic development with regression testing and full coverage have worked very well for me. Personally, I find statically typed languages to be too burdened with cumbersome declarations.

Although Fexl is not "purely functional," you can make any program be exactly as "pure" as you want it to be. For example, I/O is straightforward, so the "Hello, world" program is simply; say "Hello, world." There is no "monad" there, just a call to the "say" routine. To make that program purely functional, you would write a function that returned the string "Hello, world." Then a higher-level "impure" layer would be tasked with applying "say" to that result.

If you want to do "memoized" functions, Fexl has routines for doing mutable variables. This can be nice for optimizing naive versions of recursive functions like Fibonacci, or the Partition count function which is far more self-recursive. (Note: I'm personally in favor of a more purely functional approach to memoization, where an immutable cache table is explicitly passed around in and out of the function, but sometimes the grungy mutable approach will save you in a pinch.)

Fexl represents recursion explicitly using the fixpoint operator, which I denote by "@". The nice thing about explicit fixpoint recursion is that a call to a recursive function can be expanded directly in place -- i.e. referential transparency applies here as well. You can also identify some interesting optimizations where some parameters are fixed and do not need to be passed into recursive calls. (Note: Tail recursion "just works" because of how I do parameter substitution. I don't have to do anything special to get it.)

Fexl also has a very useful feature for idempotent "once" calculations. If you have an expression such as (f x y) buried somewhere, and maybe it takes a long time to run or even has side effects you only want to happen once, you can change the expression to (\= f x y) and then it only runs once, if you happen to call it. I cannot emphasize enough how useful and convenient that is for my real project work.

My design goal for Fexl was to have a language which is as close to pure lambda expressions as possible, with additional syntax added only to make it practical. Most of the additional syntax has an equivalent plain lambda expression. Another goal was to implement Fexl in the smallest feasible plain C code, which I think I've done.

I'll leave it there for now. If you'd like me to elaborate anything further, please let me know. You may find some of the ideas helpful in your efforts. For example, I think I've done very solid work on concepts such as parameter substitution, and the closely related concept of modularity, which Fexl handles using functions known as "contexts."