Overview

The Ninja Language is:

Procedural

Ninja has functions, but no built-in understanding of complex types.

Simple

Ninja was designed to be easy to parse and simple to build a compiler for.

The compiler only requires two passes;

The parsing pass: parses the source code into instructions and builds a symbol table for resolving program references.
The emitting pass: emits the binary associated with the instructions we parsed.

It should be possible to build a compiler capable of dealing with programs whose binary size is almost as large as the amount of RAM available in the system by streaming the source code, intermediary symbol table, and final binary to/from the disk.

Overview
Contents
Conventions
Programs
Comments
Includes
Symbolic References
- Types
- Identifiers
- Values
Variables
Functions
Function Commands
- Memory
  - Set
  - Get
- I/O
  - Read
  - Write
- Math
  - Add
  - Subtract
  - Multiply
  - Divide
  - Remainder
- Program Flow
  - Call
  - Label
  - Goto
  - If
    - Equals
    - Not Equals
    - Zero
    - Not Zero
    - Less Than
    - Greater Than

Conventions

Some quick notes about some conventions used in this document to avoid confusion before we go further.

Hex Values

Whenever you see a number starting with a $ it represents an 8-bit hexadecimal value.

When you see a number starting with % it represents a 16-bit hexadecimal value.

Memory Addresses

In tables in this document you’ll see memory addresses referenced.

They are of the form:

ADDRESS (LENGTH)

ADDRESS to ADDRESS

Where ADDRESS may be an expression.

PC generally refers to the Program Counter, or the offset in a binary.

SP generally refers to the Stack Pointer.

N generally refers to the size of RAM.

Other capitolized letters in address expressions generally refer to the size of some data structures or program instructions whose meaning must be infered from the context in which they appear.

Programs

Programs represent an entire binary. A Program Scope is created immediately when the compiler begins parsing your source code. Programs can contain Comments, Program Instructions (Includes and Declares), and Functions.

Properties:

Global Compile-Time Constants
- Global Compile-Time Constants define Symbolic References in the Program Scope,
- are of a value type
- have a constant value, and
- do not consume any binary space since they are not included in the emitted binary apart from where they are used in your program.
Global Variables
- Global Variables define Symbolic References in the Program Scope,
- are of reference type
- have a default value, and
- consume binary space since they are heap allocated at compilation time.
Functions

Assembly:

If the binary is built using the --os flag, the first instruction in an emitted binary sets up the stack pointer. Otherwise, a Goto to a Function with name “main” taking no parameters is the first instruction.

Global Variables and Functions are grouped into their two buckets and emitted in program compilation in the order they appear in the source code.

Address Range	Instructions
0 (3)	`LOAD N-1 to SP`
3 (3)	`JUMP` to main
6 (G)	Global Variables
3+G+1 to P	Functions
P+1 to N-1-MaxStackSize-1	Heap (Not Emitted)
N-1-MaxStackSize to N-1	Stack (Not Emitted)

Grammar

Program ::= (EOL | ' ' | Comment | Include | Declare | Function)* EOF

Example

# Include Standard Library #
include "z80.ninja";

# This is a Compile Tile Constant.
  The value here is the address of the Display I/O port.
  Typically you'd use this value from the standard library,
  but it's here for the sake of illustration. #
declare word g_display = %0010;

# This is the message we want to print. #
declare byte* g_helloWorld = "Hello World";

# Main Program Entry Point #
function main() {
  call print(g_helloWorld);
  return;
}

# This function prints a null-terminated string by causing
  an I/O Write request to address %0010 with the ASCII
  value of the character to print on the data line. #
function print(byte* string) {
	@ PRINT_NEXT_CHARACTER;
	load string;
	if zero goto PRINT_DONE;
	write g_display;
	increment string;
	store string;
	goto PRINT_NEXT_CHARACTER;
	@ PRINT_DONE;
	return;
}

Comments

Developer comments are, of course, critical to any well written program. Comments start with a # and end with a #. They may contain any character other than #, including newlines (so they may be single or multi-line. They act like the multiline comments in C.

Grammar

Comment ::= '#' (AsciiCharacter | EOL)* '#'

Example

# This is a sample comment #

Includes

Including other files makes it easy to stay organized. Additional files are included in-line as if the include instruction were simply replaced by the contents of the other file. Line all other Program Instructions, the instruction must be terminated with a semicolon.

The compiler will search for the include in the same folder as the program file, as well as the directory specified by an optional parameter to the compiler: --include .

The compiler will not include a file more than once.

Grammar

Include ::= 'INCLUDE' ' ' StringLiteral ';'

Example

# Include the standard library. #
include "z80.ninja";

Symbolic References

The system allows you to refer to values and reason about types using named references (think; variable names and function names.)

Properties:

A Scope
- The Program Scope or a function scope.
An Identifier
A Type
An Allocation Type
- Heap, Stack, or None
A Value

The tuple < Scope, Identifier > must be unique.

Types

Types let the system reason about how to access values at runtime.

The Ninja type syntax is different from most systems. Ninja references should not be confused with C-style pointers. Ours are different, even if they look similar.

In Ninja, we use “” characters to indicate levels of indirection. “byte*” represents the address of a “byte”. To access the value, one would have to use the “byte*” value to load the actual “byte” value from RAM at the address at “byte*” (one load from RAM instruction, therefore one “”.)

It is impossible to create a variable of type “byte” which can be modified at runtime. A type “byte” would have a “0” level of indirection. Therefore there would be no way for anyone to reference it’s original location to change it. The only time the type “byte” would be used in a program is when it’s a compile-time constant. Otherwise, all Symbolic References refer to either heap or stack allocated locations in RAM (for Program Variables and Function Variables respectively.)

Properties:

Value Type
- Value Type: Byte
- Value Type: Word
- That’s right. We got BOTH kinds! 8 AND 16 bit!
Levels of Indirection
- Types with 0 levels of indirection are compile time constants.
- Types with 1 level of indirection are called ‘value types’.
- Types with more levels of indirection are refered to as ‘reference types’.

Grammar

Type ::= ('BYTE' | 'WORD') ('*')*

Note; Types are one two non-terminals in the grammar requiring the single byte look-ahead when parsing. This can be avoided by being clever about the terminal, and using a ‘look back’ in other non-terminal parsers.

Examples

byte

The preceeding type would refer to a compile-time constant which does not require loading from ram because it has been emmited as part of the binary.

byte*

The preceeding type would refer to the address of a byte. The compiler would know that one level of indirection is present, so the actual value would have to be loaded from RAM.

Types are automatically de-referenced when type-coersion is needed. (They values are loaded from RAM automatically when pushing them to the stack / etc.)

Identifiers

Symbolic References (variables, constants, functions, parameters) have names. These names are Identifiers.

They must start with an ASCII character in the range [A-Z]. May contain any character in the set: [_,-,A-Z,0-9]

Grammar

Identifier ::= [A-Z][0-9A-Z_-]*

Note; Like Types, Identifiers require the single byte look-ahead when parsing.

Example

Some_Identifier

Values

Ninja allows you to specify values of Byte, Word types using several different representations.

Grammar

Value ::= ByteLiteral | WordLiteral | ByteCharacter | StringLiteral

Byte Literals

Represents an 8-bit value, hex encoded.

Literal starts with $. There must always be two hex digits.

Assembly

Address Range	Instructions
PC (1)	Byte Value

Grammar

ByteLiteral ::= '$' [0-9A-F][0-9A-F]

Example

$41

Word Literals

Represents a 16-bit (2 byte) value, hex encoded.

Literal starts with %. There must always be four hex digits.

Assembly

Address Range	Instructions
PC (1)	Most Signifigant Byte
PC+1 (1)	Least Signifigant Byte

Grammar

WordLiteral ::= '%' [0-9A-F][0-9A-F][0-9A-F][0-9A-F]

Example

%23FE

Byte Character

Represents an 8-bit value, starts and ends with single ticks (“’”). Contains one ASCII character.

Assembly

Address Range	Instructions
PC (1)	Byte Value

Grammar

ByteCharacter ::= "'" AsciiCharacter "'"

Example

'A'

String Literal

A null-terminated byte array represented by the ASCII values inside double quotes.

Assembly

Address Range	Instructions
PC (1)	First character
… (n-3)	…
PC+n-2 (1)	Last character
PC+n-1 (1)	0x00

Grammar

StringLiteral ::= '"' (AsciiCharacter)* '"'

Example

"Apple"

Variables

Variables define Symbolic References to:

Constant Values
Statically heap-allocated values (as is the case for variables in the Program Scope)
Stack-allocated values (as is the case for variables and parameters in a Function.)

In the case of Program and Function Scope variables; they must always be loaded from RAM before their value can be accessed; they are always of a reference type.

Grammar

Declare ::= 'DECLARE' ' ' Type ' ' Identifier (' ' '=' ' ' Value)? ';'

Examples

declare word gDisplayAddress = %0010;

The preceeding type would refer to a compile-time constant of %0010; the I/O address of the display.

declare byte* gGameLevel = $01;

The preceeding type would refer to the address of a byte.

declare word* gGameLevel = %CE05;

The preceeding type would refer to the address of a word.

function main() {
	declare byte* gGameLevel = $01;
	return;
}

The preceeding instruction would create a Function Scope variable which is allocated on the stack when the function begins. References to gGameLevel are always by reference since it must be loaded from RAM when accessed.

Functions

Functions are subroutines in a program. They are a unit of work. They define a Symbolic Reference in the Program Scope.

Properties

Identifier
Parameters
Stack allocated variables (function scope stack-relative symbolic references).
Instructions

Assembly

Assumptions are made about the calling convention:

Address	Description
SP (2)	Return Address
SP+2	Rightmost Parameter
	…
SP+P-n (n)	Leftmost Parameter
SP+P to N-1	Prior Stack Frames

The function is emitted as follows ( * = Address of function start Symbolic Reference ):

Address	Description
PC (3)	`JMP to PC+F+1` Jump over function instructions.
* PC+3	Increment SP by sum of function variable sizes
	Store initial values for function variables in RAM
	Instructions
PC+F	`JMP to %0000` Reset program if function failed to return.

When the instructions are run, they assume the following stack layout after function initialization has completed:

Address	Description
SP (V)	Function Variables
SP+V (2)	Return Address
SP+V+2 (P)	Function Parameters
SP+V+2+P to N-1	Prior Stack Frames

Grammar

Function ::= 'FUNCTION' ' ' FunctionSignature ' ' FunctionBody ';'

FunctionSignature ::=
	Identifier '(' FunctionParameter (',' ' ' FunctionParameter)* ')'

FunctionParameter ::= Type ' ' Identifier

FunctionBody ::= '{' (EOL | ' ' | ';' | Comment | Instruction)* '}'

FunctionBody

Example

# This function prints a null-terminated string by causing
  an I/O Write request to address %0010 with the ASCII
  value of the character to print on the data line. #
function print(byte* string) {
	@ PRINT_NEXT_CHARACTER;
	load string;
	if zero goto PRINT_DONE;
	write g_display;
	increment string;
	store string;
	goto PRINT_NEXT_CHARACTER;
	@ PRINT_DONE;
	return;
}

Function Commands

I/O

Read

Write

Math

Add

Subtract

Multiply

Divide

Remainder

Program Flow

Call

Calling a function is accomplished with the Call instruction.

Calling a function always type-coerces any passed values into one level of indirection lower than the associated parameter in the signature of the function it’s calling.

Why is this?

It’s because the type references in the function signature are relative to that function. Within the function, any parameter must always be of reference type, because they are allocated on the stack when they are passed, so therefore there is a reference to them - they are not compile-time-constants, so they must be of reference type. However, the value we place on the stack when calling them is always one level of indirection below the signature’s type.

Take, for example, the case of calling a function with the signature:

function foo (byte* bar) { ... }

bar is an address of a byte in memory - specifically, it’s an address to a byte in memory, defined relative to the current stack pointer. That’s the definition of a function parameter. Working backwards from this definition, it implies that the actual byte value is in memory at a location relative to the current stack pointer. This is this function’s own version of the byte.

So if we called the function like this:

define byte* bar = $10;
...
call foo(bar);

We don’t actually want to push bar to the stack. That would be a 16-bit word, the value of our address to a byte. We need to pass the actual value. We need to pass a byte - which is one level of indirection less than byte*.

Assembly

Address	Description
	Cooerce leftmost calling parameter to type of leftmost function parameter and push to stack.
	…
	Cooerce rightmost most calling parameter to type of rightmost function parameter and push to stack.
PC+I-n (n)	`CALL to Function Address`
	Increment SP to “pop” all parameters.

Grammar

Call ::= 'CALL' Space Address '(' (CallParameter (',' Space CallParameter)*)? ')'

CallParameter ::= ByteValue | WordValue | Identifier

Example

call Print ($01, foo, bar);

// NinjaCode Language Reference

Overview

Contents

Conventions

Hex Values

Memory Addresses

Programs

Comments

Includes

Symbolic References

Types

Identifiers

Values

Byte Literals

Word Literals

Byte Character

String Literal

Variables

Functions

Function Commands

I/O

Read

Write

Math

Add

Subtract

Multiply

Divide

Remainder

Program Flow

Call

Label

Goto

If

Equals

Not Equals

Zero

Not Zero

Less Than

Greater Than