The Ninja Language is:
Procedural
Ninja has functions, but no built-in understanding of complex types.
Simple
Ninja was designed to be easy to parse and simple to build a compiler for.
The compiler only requires two passes;
It should be possible to build a compiler capable of dealing with programs whose binary size is almost as large as the amount of RAM available in the system by streaming the source code, intermediary symbol table, and final binary to/from the disk.
Some quick notes about some conventions used in this document to avoid confusion before we go further.
Whenever you see a number starting with a $
it represents an 8-bit hexadecimal value.
When you see a number starting with %
it represents a 16-bit hexadecimal value.
In tables in this document you’ll see memory addresses referenced.
They are of the form:
ADDRESS (LENGTH)
or
ADDRESS to ADDRESS
Where ADDRESS
may be an expression.
PC
generally refers to the Program Counter, or the offset in a binary.
SP
generally refers to the Stack Pointer.
N
generally refers to the size of RAM.
Other capitolized letters in address expressions generally refer to the size of some data structures or program instructions whose meaning must be infered from the context in which they appear.
Programs represent an entire binary. A Program Scope is created immediately when the compiler begins parsing your source code. Programs can contain Comments
, Program Instructions (Include
s and Declare
s), and Function
s.
Properties:
Assembly:
If the binary is built using the --os
flag, the first instruction in an emitted binary sets up the stack pointer. Otherwise, a Goto
to a Function
with name “main” taking no parameters is the first instruction.
Global Variables and Functions are grouped into their two buckets and emitted in program compilation in the order they appear in the source code.
Address Range | Instructions |
---|---|
0 (3) | LOAD N-1 to SP |
3 (3) | JUMP to main |
6 (G) | Global Variables |
3+G+1 to P | Functions |
P+1 to N-1-MaxStackSize-1 | Heap (Not Emitted) |
N-1-MaxStackSize to N-1 | Stack (Not Emitted) |
Grammar
Program ::= (EOL | ' ' | Comment | Include | Declare | Function)* EOF
Example
# Include Standard Library #
include "z80.ninja";
# This is a Compile Tile Constant.
The value here is the address of the Display I/O port.
Typically you'd use this value from the standard library,
but it's here for the sake of illustration. #
declare word g_display = %0010;
# This is the message we want to print. #
declare byte* g_helloWorld = "Hello World";
# Main Program Entry Point #
function main() {
call print(g_helloWorld);
return;
}
# This function prints a null-terminated string by causing
an I/O Write request to address %0010 with the ASCII
value of the character to print on the data line. #
function print(byte* string) {
@ PRINT_NEXT_CHARACTER;
load string;
if zero goto PRINT_DONE;
write g_display;
increment string;
store string;
goto PRINT_NEXT_CHARACTER;
@ PRINT_DONE;
return;
}
Developer comments are, of course, critical to any well written program. Comments start with a #
and end with a #
. They may contain any character other than #
, including newlines (so they may be single or multi-line. They act like the multiline comments in C.
Grammar
Comment ::= '#' (AsciiCharacter | EOL)* '#'
Example
# This is a sample comment #
Including other files makes it easy to stay organized. Additional files are included in-line as if the include
instruction were simply replaced by the contents of the other file. Line all other Program Instructions, the instruction must be terminated with a semicolon.
The compiler will search for the include in the same folder as the program file, as well as the directory specified by an optional parameter to the compiler: --include
.
The compiler will not include a file more than once.
Grammar
Include ::= 'INCLUDE' ' ' StringLiteral ';'
Example
# Include the standard library. #
include "z80.ninja";
The system allows you to refer to values and reason about types using named references (think; variable names and function names.)
Properties:
The tuple < Scope
, Identifier
> must be unique.
Types let the system reason about how to access values at runtime.
The Ninja type syntax is different from most systems. Ninja references should not be confused with C-style pointers. Ours are different, even if they look similar.
In Ninja, we use “” characters to indicate levels of indirection. “byte*” represents the address of a “byte”. To access the value, one would have to use the “byte*” value to load the actual “byte” value from RAM at the address at “byte*” (one load from RAM instruction, therefore one “”.)
It is impossible to create a variable of type “byte” which can be modified at runtime. A type “byte” would have a “0” level of indirection. Therefore there would be no way for anyone to reference it’s original location to change it. The only time the type “byte” would be used in a program is when it’s a compile-time constant. Otherwise, all Symbolic References refer to either heap or stack allocated locations in RAM (for Program Variables and Function Variables respectively.)
Properties:
Byte
Word
Grammar
Type ::= ('BYTE' | 'WORD') ('*')*
Note; Types are one two non-terminals in the grammar requiring the single byte look-ahead when parsing. This can be avoided by being clever about the terminal, and using a ‘look back’ in other non-terminal parsers.
Examples
byte
The preceeding type would refer to a compile-time constant which does not require loading from ram because it has been emmited as part of the binary.
byte*
The preceeding type would refer to the address of a byte. The compiler would know that one level of indirection is present, so the actual value would have to be loaded from RAM.
Types are automatically de-referenced when type-coersion is needed. (They values are loaded from RAM automatically when pushing them to the stack / etc.)
Symbolic References (variables, constants, functions, parameters) have names. These names are Identifier
s.
They must start with an ASCII character in the range [A-Z]. May contain any character in the set: [_,-,A-Z,0-9]
Grammar
Identifier ::= [A-Z][0-9A-Z_-]*
Note; Like Types, Identifiers require the single byte look-ahead when parsing.
Example
Some_Identifier
Ninja allows you to specify values of Byte
, Word
types using several different representations.
Grammar
Value ::= ByteLiteral | WordLiteral | ByteCharacter | StringLiteral
Represents an 8-bit value, hex encoded.
Literal starts with $
. There must always be two hex digits.
Assembly
Address Range | Instructions |
---|---|
PC (1) | Byte Value |
Grammar
ByteLiteral ::= '$' [0-9A-F][0-9A-F]
Example
$41
Represents a 16-bit (2 byte) value, hex encoded.
Literal starts with %
. There must always be four hex digits.
Assembly
Address Range | Instructions |
---|---|
PC (1) | Most Signifigant Byte |
PC+1 (1) | Least Signifigant Byte |
Grammar
WordLiteral ::= '%' [0-9A-F][0-9A-F][0-9A-F][0-9A-F]
Example
%23FE
Represents an 8-bit value, starts and ends with single ticks (“’”). Contains one ASCII character.
Assembly
Address Range | Instructions |
---|---|
PC (1) | Byte Value |
Grammar
ByteCharacter ::= "'" AsciiCharacter "'"
Example
'A'
A null-terminated byte array represented by the ASCII values inside double quotes.
Assembly
Address Range | Instructions |
---|---|
PC (1) | First character |
… (n-3) | … |
PC+n-2 (1) | Last character |
PC+n-1 (1) | 0x00 |
Grammar
StringLiteral ::= '"' (AsciiCharacter)* '"'
Example
"Apple"
Variables define Symbolic References to:
In the case of Program and Function Scope variables; they must always be loaded from RAM before their value can be accessed; they are always of a reference type.
Grammar
Declare ::= 'DECLARE' ' ' Type ' ' Identifier (' ' '=' ' ' Value)? ';'
Examples
declare word gDisplayAddress = %0010;
The preceeding type would refer to a compile-time constant of %0010; the I/O address of the display.
declare byte* gGameLevel = $01;
The preceeding type would refer to the address of a byte.
declare word* gGameLevel = %CE05;
The preceeding type would refer to the address of a word.
function main() {
declare byte* gGameLevel = $01;
return;
}
The preceeding instruction would create a Function Scope variable which is allocated on the stack when the function begins. References to gGameLevel are always by reference since it must be loaded from RAM when accessed.
Functions are subroutines in a program. They are a unit of work. They define a Symbolic Reference in the Program Scope.
Properties
Assembly
Assumptions are made about the calling convention:
Address | Description |
---|---|
SP (2) | Return Address |
SP+2 | Rightmost Parameter |
… | |
SP+P-n (n) | Leftmost Parameter |
SP+P to N-1 | Prior Stack Frames |
The function is emitted as follows ( * = Address of function start Symbolic Reference ):
Address | Description |
---|---|
PC (3) | JMP to PC+F+1 Jump over function instructions. |
* PC+3 | Increment SP by sum of function variable sizes |
Store initial values for function variables in RAM | |
Instructions | |
PC+F | JMP to %0000 Reset program if function failed to return. |
When the instructions are run, they assume the following stack layout after function initialization has completed:
Address | Description |
---|---|
SP (V) | Function Variables |
SP+V (2) | Return Address |
SP+V+2 (P) | Function Parameters |
SP+V+2+P to N-1 | Prior Stack Frames |
Grammar
Function ::= 'FUNCTION' ' ' FunctionSignature ' ' FunctionBody ';'
FunctionSignature ::=
Identifier '(' FunctionParameter (',' ' ' FunctionParameter)* ')'
FunctionParameter ::= Type ' ' Identifier
FunctionBody ::= '{' (EOL | ' ' | ';' | Comment | Instruction)* '}'
Example
# This function prints a null-terminated string by causing
an I/O Write request to address %0010 with the ASCII
value of the character to print on the data line. #
function print(byte* string) {
@ PRINT_NEXT_CHARACTER;
load string;
if zero goto PRINT_DONE;
write g_display;
increment string;
store string;
goto PRINT_NEXT_CHARACTER;
@ PRINT_DONE;
return;
}
Calling a function is accomplished with the Call
instruction.
Calling a function always type-coerces any passed values into one level of indirection lower than the associated parameter in the signature of the function it’s calling.
Why is this?
It’s because the type references in the function signature are relative to that function. Within the function, any parameter must always be of reference type, because they are allocated on the stack when they are passed, so therefore there is a reference to them - they are not compile-time-constants, so they must be of reference type. However, the value we place on the stack when calling them is always one level of indirection below the signature’s type.
Take, for example, the case of calling a function with the signature:
function foo (byte* bar) { ... }
bar
is an address of a byte in memory - specifically, it’s an address to a byte in memory, defined relative to the current stack pointer. That’s the definition of a function parameter. Working backwards from this definition, it implies that the actual byte value is in memory at a location relative to the current stack pointer. This is this function’s own version of the byte.
So if we called the function like this:
define byte* bar = $10;
...
call foo(bar);
We don’t actually want to push bar
to the stack. That would be a 16-bit word, the value of our address to a byte. We need to pass the actual value. We need to pass a byte
- which is one level of indirection less than byte*
.
Assembly
Address | Description |
---|---|
Cooerce leftmost calling parameter to type of leftmost function parameter and push to stack. | |
… | |
Cooerce rightmost most calling parameter to type of rightmost function parameter and push to stack. | |
PC+I-n (n) | CALL to Function Address |
Increment SP to “pop” all parameters. |
Grammar
Call ::= 'CALL' Space Address '(' (CallParameter (',' Space CallParameter)*)? ')'
CallParameter ::= ByteValue | WordValue | Identifier
Example
call Print ($01, foo, bar);