// NinjaCode Language Reference

8-Bit Retro PC Goodness. * Probably not as fast as a real ninja.

Overview

The Ninja Language is:

Procedural

Ninja has functions, but no built-in understanding of complex types.

Simple

Ninja was designed to be easy to parse and simple to build a compiler for.

The compiler only requires two passes;

It should be possible to build a compiler capable of dealing with programs whose binary size is almost as large as the amount of RAM available in the system by streaming the source code, intermediary symbol table, and final binary to/from the disk.

Contents

Conventions

Some quick notes about some conventions used in this document to avoid confusion before we go further.

Hex Values

Whenever you see a number starting with a $ it represents an 8-bit hexadecimal value.

When you see a number starting with % it represents a 16-bit hexadecimal value.

Memory Addresses

In tables in this document you’ll see memory addresses referenced.

They are of the form:

ADDRESS (LENGTH)

or

ADDRESS to ADDRESS

Where ADDRESS may be an expression.

PC generally refers to the Program Counter, or the offset in a binary.

SP generally refers to the Stack Pointer.

N generally refers to the size of RAM.

Other capitolized letters in address expressions generally refer to the size of some data structures or program instructions whose meaning must be infered from the context in which they appear.

Programs

Programs represent an entire binary. A Program Scope is created immediately when the compiler begins parsing your source code. Programs can contain Comments, Program Instructions (Includes and Declares), and Functions.

Properties:

Assembly:

If the binary is built using the --os flag, the first instruction in an emitted binary sets up the stack pointer. Otherwise, a Goto to a Function with name “main” taking no parameters is the first instruction.

Global Variables and Functions are grouped into their two buckets and emitted in program compilation in the order they appear in the source code.

Address Range Instructions
0 (3) LOAD N-1 to SP
3 (3) JUMP to main
6 (G) Global Variables
3+G+1 to P Functions
P+1 to N-1-MaxStackSize-1 Heap (Not Emitted)
N-1-MaxStackSize to N-1 Stack (Not Emitted)

Grammar

Program ::= (EOL | ' ' | Comment | Include | Declare | Function)* EOF

Program.png

Example

# Include Standard Library #
include "z80.ninja";

# This is a Compile Tile Constant.
  The value here is the address of the Display I/O port.
  Typically you'd use this value from the standard library,
  but it's here for the sake of illustration. #
declare word g_display = %0010;

# This is the message we want to print. #
declare byte* g_helloWorld = "Hello World";

# Main Program Entry Point #
function main() {
  call print(g_helloWorld);
  return;
}

# This function prints a null-terminated string by causing
  an I/O Write request to address %0010 with the ASCII
  value of the character to print on the data line. #
function print(byte* string) {
	@ PRINT_NEXT_CHARACTER;
	load string;
	if zero goto PRINT_DONE;
	write g_display;
	increment string;
	store string;
	goto PRINT_NEXT_CHARACTER;
	@ PRINT_DONE;
	return;
}

Comments

Developer comments are, of course, critical to any well written program. Comments start with a # and end with a #. They may contain any character other than #, including newlines (so they may be single or multi-line. They act like the multiline comments in C.

Grammar

Comment ::= '#' (AsciiCharacter | EOL)* '#'

Comment.png

Example

# This is a sample comment #

Includes

Including other files makes it easy to stay organized. Additional files are included in-line as if the include instruction were simply replaced by the contents of the other file. Line all other Program Instructions, the instruction must be terminated with a semicolon.

The compiler will search for the include in the same folder as the program file, as well as the directory specified by an optional parameter to the compiler: --include .

The compiler will not include a file more than once.

Grammar

Include ::= 'INCLUDE' ' ' StringLiteral ';'

Include.png

Example

# Include the standard library. #
include "z80.ninja";

Symbolic References

The system allows you to refer to values and reason about types using named references (think; variable names and function names.)

Properties:

The tuple < Scope, Identifier > must be unique.

Types

Types let the system reason about how to access values at runtime.

The Ninja type syntax is different from most systems. Ninja references should not be confused with C-style pointers. Ours are different, even if they look similar.

In Ninja, we use “” characters to indicate levels of indirection. “byte*” represents the address of a “byte”. To access the value, one would have to use the “byte*” value to load the actual “byte” value from RAM at the address at “byte*” (one load from RAM instruction, therefore one “”.)

It is impossible to create a variable of type “byte” which can be modified at runtime. A type “byte” would have a “0” level of indirection. Therefore there would be no way for anyone to reference it’s original location to change it. The only time the type “byte” would be used in a program is when it’s a compile-time constant. Otherwise, all Symbolic References refer to either heap or stack allocated locations in RAM (for Program Variables and Function Variables respectively.)

Properties:

Grammar

Type ::= ('BYTE' | 'WORD') ('*')*

Type.png

Note; Types are one two non-terminals in the grammar requiring the single byte look-ahead when parsing. This can be avoided by being clever about the terminal, and using a ‘look back’ in other non-terminal parsers.

Examples

byte

The preceeding type would refer to a compile-time constant which does not require loading from ram because it has been emmited as part of the binary.

byte*

The preceeding type would refer to the address of a byte. The compiler would know that one level of indirection is present, so the actual value would have to be loaded from RAM.

Types are automatically de-referenced when type-coersion is needed. (They values are loaded from RAM automatically when pushing them to the stack / etc.)

Identifiers

Symbolic References (variables, constants, functions, parameters) have names. These names are Identifiers.

They must start with an ASCII character in the range [A-Z]. May contain any character in the set: [_,-,A-Z,0-9]

Grammar

Identifier ::= [A-Z][0-9A-Z_-]*

Identifier.png

Note; Like Types, Identifiers require the single byte look-ahead when parsing.

Example

Some_Identifier

Values

Ninja allows you to specify values of Byte, Word types using several different representations.

Grammar

Value ::= ByteLiteral | WordLiteral | ByteCharacter | StringLiteral

Value.png

Byte Literals

Represents an 8-bit value, hex encoded.

Literal starts with $. There must always be two hex digits.

Assembly

Address Range Instructions
PC (1) Byte Value

Grammar

ByteLiteral ::= '$' [0-9A-F][0-9A-F]

ByteLiteral.png

Example

$41

Word Literals

Represents a 16-bit (2 byte) value, hex encoded.

Literal starts with %. There must always be four hex digits.

Assembly

Address Range Instructions
PC (1) Most Signifigant Byte
PC+1 (1) Least Signifigant Byte

Grammar

WordLiteral ::= '%' [0-9A-F][0-9A-F][0-9A-F][0-9A-F]

WordLiteral.png

Example

%23FE

Byte Character

Represents an 8-bit value, starts and ends with single ticks (“’”). Contains one ASCII character.

Assembly

Address Range Instructions
PC (1) Byte Value

Grammar

ByteCharacter ::= "'" AsciiCharacter "'"

ByteCharacter.png

Example

'A'

String Literal

A null-terminated byte array represented by the ASCII values inside double quotes.

Assembly

Address Range Instructions
PC (1) First character
… (n-3)
PC+n-2 (1) Last character
PC+n-1 (1) 0x00

Grammar

StringLiteral ::= '"' (AsciiCharacter)* '"'

StringLiteral.png

Example

"Apple"

Variables

Variables define Symbolic References to:

In the case of Program and Function Scope variables; they must always be loaded from RAM before their value can be accessed; they are always of a reference type.

Grammar

Declare ::= 'DECLARE' ' ' Type ' ' Identifier (' ' '=' ' ' Value)? ';'

Declare.png

Examples

declare word gDisplayAddress = %0010;

The preceeding type would refer to a compile-time constant of %0010; the I/O address of the display.

declare byte* gGameLevel = $01;

The preceeding type would refer to the address of a byte.

declare word* gGameLevel = %CE05;

The preceeding type would refer to the address of a word.

function main() {
	declare byte* gGameLevel = $01;
	return;
}

The preceeding instruction would create a Function Scope variable which is allocated on the stack when the function begins. References to gGameLevel are always by reference since it must be loaded from RAM when accessed.

Functions

Functions are subroutines in a program. They are a unit of work. They define a Symbolic Reference in the Program Scope.

Properties

Assembly

Assumptions are made about the calling convention:

Address Description
SP (2) Return Address
SP+2 Rightmost Parameter
 
SP+P-n (n) Leftmost Parameter
SP+P to N-1 Prior Stack Frames

The function is emitted as follows ( * = Address of function start Symbolic Reference ):

Address Description
PC (3) JMP to PC+F+1 Jump over function instructions.
* PC+3 Increment SP by sum of function variable sizes
  Store initial values for function variables in RAM
  Instructions
PC+F JMP to %0000 Reset program if function failed to return.

When the instructions are run, they assume the following stack layout after function initialization has completed:

Address Description
SP (V) Function Variables
SP+V (2) Return Address
SP+V+2 (P) Function Parameters
SP+V+2+P to N-1 Prior Stack Frames

Grammar

Function ::= 'FUNCTION' ' ' FunctionSignature ' ' FunctionBody ';'

Function.png

FunctionSignature ::=
	Identifier '(' FunctionParameter (',' ' ' FunctionParameter)* ')'

FunctionSignature.png

FunctionParameter ::= Type ' ' Identifier

FunctionParameter.png

FunctionBody ::= '{' (EOL | ' ' | ';' | Comment | Instruction)* '}'

FunctionBody

Example

# This function prints a null-terminated string by causing
  an I/O Write request to address %0010 with the ASCII
  value of the character to print on the data line. #
function print(byte* string) {
	@ PRINT_NEXT_CHARACTER;
	load string;
	if zero goto PRINT_DONE;
	write g_display;
	increment string;
	store string;
	goto PRINT_NEXT_CHARACTER;
	@ PRINT_DONE;
	return;
}

Function Commands

I/O

Read

Write

Math

Add

Subtract

Multiply

Divide

Remainder

Program Flow

Call

Calling a function is accomplished with the Call instruction.

Calling a function always type-coerces any passed values into one level of indirection lower than the associated parameter in the signature of the function it’s calling.

Why is this?

It’s because the type references in the function signature are relative to that function. Within the function, any parameter must always be of reference type, because they are allocated on the stack when they are passed, so therefore there is a reference to them - they are not compile-time-constants, so they must be of reference type. However, the value we place on the stack when calling them is always one level of indirection below the signature’s type.

Take, for example, the case of calling a function with the signature:

function foo (byte* bar) { ... }

bar is an address of a byte in memory - specifically, it’s an address to a byte in memory, defined relative to the current stack pointer. That’s the definition of a function parameter. Working backwards from this definition, it implies that the actual byte value is in memory at a location relative to the current stack pointer. This is this function’s own version of the byte.

So if we called the function like this:

define byte* bar = $10;
...
call foo(bar);

We don’t actually want to push bar to the stack. That would be a 16-bit word, the value of our address to a byte. We need to pass the actual value. We need to pass a byte - which is one level of indirection less than byte*.

Assembly

Address Description
  Cooerce leftmost calling parameter to type of leftmost function parameter and push to stack.
 
  Cooerce rightmost most calling parameter to type of rightmost function parameter and push to stack.
PC+I-n (n) CALL to Function Address
  Increment SP to “pop” all parameters.

Grammar

Call ::= 'CALL' Space Address '(' (CallParameter (',' Space CallParameter)*)? ')'

Call.png

CallParameter ::= ByteValue | WordValue | Identifier

CallParameter.png

Example

call Print ($01, foo, bar);

Label

Goto

If

Equals

Not Equals

Zero

Not Zero

Less Than

Greater Than