Tuesday, 17 April 2018

egg Functions

My first test script for functions in egg was everyone's favourite chestnut: the Fibonacci series.

  int fibonacci(int n) {
    if (n < 2) {
      return n;
    }
    return fibonacci(n - 1) + fibonacci(n - 2);
  }
  print(fibonacci(10));

It does indeed print out "55"!

In egg, functions defined like this are actually special cases of callable objects. The script declares an identifier "fibonacci" which is initialised with an instance of an object that supports the "call" operation: in this case, taking a single integer parameter and returning an integer.

At present, there is no notion of read-only variables in egg, so it's possible to subsequently assign a different function to "fibonacci":

  int fibonacci(int n) {
    if (n < 2) {
      return n;
    }
    return fibonacci(n - 1) + fibonacci(n - 2);
  }
  int zero(int n) {
    return 0;
  }
  fibonacci = zero;
  print(fibonacci(10));

This prints out "0". It's a good demonstration that functions in egg are considered first-class entities, but it might violate the principle of "least surprise" for newcomers.

egg "For" Statements

Recently, I got my first egg script running. Here it is:

  var s = 0;
  for (var i = 0; i < 100; ++i) {
    s += i;
  }
  print(s);

You'll be unsurprised to hear that this printed out "4950" (it was a pleasant surprise to me when it first happened, though, I can tell you!)

There are three things worth mentioning in this script.

Firstly, the "var" keyword initiates type inference for variables "s" and "i". In both cases, "int" is inferred (there are no unsigned integers in egg).

Secondly, the "print" function is a built-in method that will probably be removed, but is useful for testing, at the moment.

Finally, the syntax of "for" statements turns out to be non-trivial. Like C++, I adopted two forms:

  for (before ; condition ; step) { ... }
  for (iterator : collection) { ... }

The latter form is for iterating around collections, which we won't discuss here.

My main concern with the former form of the "for" loop is:
Which syntactic elements are valid for "before", "condition" and "step"?
The easiest of the three clauses is "condition". It's optional, but I only allow expressions that evaluate to Boolean values. This is similar to all the main curly-brace languages (C/C++/Java/JavaScript).

The "before" statement is also optional, but it must be a statement. This includes variable declarations: the scope of such variables is limited to the "for" statement, including the "condition" and "step" clauses. Again this is similar to the other languages (if we ignore JavaScript's problematic scoping rules).

The "step" statement is optional too, but cannot be a declaration. As mentioned previously, the increment and decrement operators are supported in egg purely to allow the classic for-loop idiom seen in this example.

But then I got a bit confused with which egg statements should be valid for "before" and "step". Can you use "break" and "continue"? If so, what do they do?

So I asked my C++ compiler, and it says that I cannot use "break" or "continue", but I can "throw" exceptions. The reason you cannot "break" or "continue" in "before" and "step" clauses is because those statements are just that: statements. The "before" and "step" clauses in C++ expect C++ expressions.

But why can you "throw" exceptions in those clauses in C++?

  for (auto i = 0; i < 100; throw "Bang!") {} // Valid C++

Well, it turns out that what I think of as throw statements are actually throw expressions (of type "void") in the formal syntax. It's a cul-de-sac that others have found themselves in too!

As egg is meant to be an easy-to-learn language, with few surprises, I decided to classify "throw" as a statement and explicitly forbid it in expressions and "before" and "step" clauses of "for" statements.

Tuesday, 3 April 2018

ZX Spectrum Flags of the European Union

Someone, somewhere, out there on the Internet, is looking for the flags of the twenty-eight (current) member countries of the European Union  ... in the original Sinclair ZX Spectrum SCREEN$ format. Surely...

Sunday, 1 April 2018

Mappa Edmundi de Waal

Personally, I think Edmund missed a trick...


(original by Edmund de Waal)

Friday, 23 March 2018

A Prime Example of JavaScript Golf

This is why you shouldn't play JavaScript golf:

for(a=[1];!a[999];)/^(11+)\1+$/.test(a+=1)||print(a.length)

Thursday, 8 March 2018

Vexatious Parses in C++

As part of my work on the egg computer language specification, I've been looking into parsing curly-brace-type languages. There are a number of cul de sacs in these language specifications. Here's one from C++ I've been struggling with today:
    int a = 1;
    int b = 2;
    int c = a-b;
What's the value of "c"? Obviously, it's minus one. But what about this:
    c = a--b;
My Microsoft compiler tells me that this is a malformed expression:
    syntax error: missing ';' before identifier 'b'
But the following is fine:
    c = a---b;
This sets "c" to minus one and decrements "a". Honest.

Here's a list of parses:
    a-b      // Parsed as "a - b"
    a--b     // Fails to compile: missing ';' before identifier 'b'
    a---b    // Parsed as "a-- - b"
    a----b   // Fails to compile: '--' needs l-value
    a-----b  // Fails to compile: '--' needs l-value

    a- -b    // Parsed as "a - -b"
    a- --b   // Parsed as "a - --b"
    a-- -b   // Parsed as "a-- -b"
    a- - -b  // Parsed as "a - - -b"
The compiler is obviously "greedy" when parsing operators; so, in the absence of white-space, it's easy for it to overlook an alternative interpretation:
    a--b     // COULD be parsed as "a - -b"
    a----b   // COULD be parsed as "a-- - -b"
    a-----b  // COULD be parsed as "a-- - --b"
I expect the compiler-writers have their hands tied by the formal language specification. But, for a new language like egg, I don't have any such restrictions.

I decided that prefix and postfix increments/decrements as expressions are bad things. This is mainly due to problems associated with side-effects and evaluation ordering. Consider:
    int a = p[++i] + p[i++]; // Not allowed
However, I think I will retain the prefix increment/decrement statements:
    ++i; // Allowed
    --i; // Allowed
    i++; // Not allowed
    i--; // Not allowed
This permits the idiomatic counter-based loop:
    for (i = 0; i < count; ++i) {
        ...
    }
The reasons for only allowing the prefix versions are two-fold:
  1. It make the language specification much less ambiguous; and
  2. People still harp on about prefix increments/decrements being slightly faster than their postfix variants, which is why they are "preferred" for looping.
Whilst I was at it, I also decided I can probably do without the unary '+' operator. That gets rid of the truly vexatious:
    c = a+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+b;

Tuesday, 27 February 2018

What is Egg?

“Egg” is an idea I’ve been thinking about for a long time. Here’s the background…

At work, over the last few months, I’ve used many computer languages:

  1. C++
  2. C#
  3. Java
  4. JavaScript
  5. Clojure
  6. Python
  7. PowerShell
  8. Windows batch commands
  9. Bash

and almost as many build/configuration file formats:

  1. Makefile
  2. JSON
  3. XML
  4. YAML
  5. INI

I appreciate that domain specific languages have their place, but often runtime performance is not an issue, so using a general-purpose language would be more than adequate. Constantly having to context-switch between difference languages and paradigms is exhausting; not to mention the numerous bugs caused by forgetting the specifics of each set of syntaxes, escape sequences, library routine quirks and so on.

What if there was a simple language that was powerful enough to get the job done without having to remember too many subtleties of the language?

Another issue I have with many languages is the lack of simple interoperability. If I want to call a C++ routine from Clojure, I’m going to have to jump through hoops.

Similarly, if you develop a prototype in one language, you often have to “productionize” it by converting it to another. This is a great source of bugs.

What if there was a language that you could transpile into other languages?

Even if the transpiled code was purely used to get a unit test framework up and running before refactoring, this would greatly mitigate the introduction of bugs.

Some of these interoperability issues are due to the frameworks or virtual machines that some of the languages require:

  • Java Virtual Machine
  • .NET Framework
  • and so on

In this regard, I think that JavaScript is quite successful because of its ubiquity: press F12 inside your browser and you have quite a powerful development environment. Running scripts outside of a browser simply requires you to download a zero-install executable such as Node.js.

What if there was a language that ran almost identically on many frameworks and/or virtual machines?

Anecdotally, it seems that Python is gaining ground as a teaching language. I’m not going to knock Python, but it seems strange that there appear to be few other candidates for teaching good software engineering practices.

What if there was a language that could be used for teaching the fundamentals of programming whilst still being useful outside of academic institutions?

Talking of Python, why do computer languages develop to the point where the designers make breaking changes (e.g. Python 2 versus Python 3)? Even venerable C++ is getting a new set of features every three years that’s difficult to keep up with.

What if there was a language that had a relatively stable syntax?

But “egg” isn’t just a computer language specification, it’s:

  • An engine to run scripts written in egg
  • A compiler to generate native code from egg source
  • A set of transpilers to generate other computer languages from egg source
  • A build system (written in egg, of course)
  • A set of core packages to perform common tasks
  • A testing framework
  • A package manager

So, that’s what “egg” is: a personal project to give me an excuse to investigate these issues.