Using the P65 Assembler

The P65 is a powerful assembler for the 6502 family of processors. This document will walk you through the features of P65 and discuss how to take advantage of its flexibility. It will not teach you 6502 assembly; for that, you will want to go to the Links page.

For tables and examples of the various addressing modes and data types, please see the command reference.

This tutorial is based on version 1.1 of P65-Perl. Ophis uses slightly different command structures, and will be covered in a different document.

Running the assembler

P65 is written in Perl, so the precise things you need to do to run it vary by OS - but at bottom, it's no different from any other Perl script. Check your local documentation.

The script itself takes, at minimum, two arguments; a source file and a target file. The first filename it sees it takes to be the assembler source - the second filename is the name of the binary file it is to produce. Other options may be available depending on your version of the assembler; invoke the assembler with no arguments to see a complete list.

Writing code

For the most part, the instruction format in P65 follows the normal conventions. However, the "accumulator" addressing mode takes no arguments (it would read the statement LSR A as right-shifting the memory location pointed to by the label "A". Also, you may only write one instruction per line.

Most instructions (except for Implied and Accumulator mode instructions) take arguments. These may be specified in several ways. Decimal numbers (base 10) may be entered directly. Hexadecimal (base 16) numbers may be entered prefixed by a $. Octal (base 8) numbers are prefixed by zeros. Binary numbers are prefixed with a %. If you wish to encode an ASCII character, use an apostrophe in front of it.

Thus, 65 = $41 = 0101 = %01000001 = 'A.

To refer to a label, simply type the name of that label. Labels are not case sensitive.

The special label ^ refers to the value of the program counter at the beginning of the line; this is more useful when used in expressions (see "More complex arguments" below). However, the statement jmp ^ gives one a free infinite loop.

P65 will automatically select the optimum instruction encoding, taking every opportunity to use a Zero Page representation, and automatically computing relative offsets for branches from the addresses (or absolutely specified addresses).

Defining labels

To define an ordinary label, simply prefix the line with the label name and a colon. Labels may be any combination of letters, numbers, underscores, and apostrophes, but must begin with a letter. You may not use other instruction components as labels; the opcodes, "x", and "y" are thus not permissible label names.

Comments

Anything from a semicolon to the end of a line is treated as a comment and ignored. The exceptions to this are (a) inside of strings (see below), or (b) when backslashed. To specify the ASCII constant for a semicolon, write '\;.

Specifiying data constants

Numeric data

To specify numeric data, use the .byte or .word pragmas. Simply specify the values as one would in an instruction. Multiple values may be represented by separating them with commas.

All values in a .byte pragma must be within the range of $00 to $FF. They take up one byte each. All values in a .word pragma are treated as two-byte values, stored low byte first. Values that would fit within one byte are expanded to two (high byte = $00); values that won't fit into two bytes (not between $0000 and $FFFF) are flagged as errors.

String data

String data is represented with the .ascii directive. Only one string is allowed per line.

Strings are enclosed in double quotes, "like this." To place a double quote inside a string, backslash it first.

Jump tables

Jump tables are lists of memory addresses in memory. To produce these, just hand a sequence of labels to the .word directive.

More complex arguments

Arguments can be considerably more than just a number or label. The simplest enhancement to them is the addition of offsets. An offset is a number that is added or subtracted to the basic value at assemble time. If the label array points at location $C000, for example, then the line LDA array+3 assembles down to the same instruction as LDA $C003. It is not necessary for the core value to be a label, either; if you wish to make clearer in the source that the number you're comparing against is a base + an offset, you can say things like $FF-$20, or even $110-32. Type and range checking does not occur until after the whole expression has been evaluated.

In addition to offsets, there are many times when one is working with 16-bit constants that one wants to get an 8-bit chunk of them. P65 provides simple operators for taking the low byte or high byte of a value. To get at the low byte, simply put a < sign in front of it; for the high byte, use >. Note that low byte or high byte is computer after adding or subtracting any offset.

No range checking occurs until after the full evaluation has been completed. This means it's perfectly OK to write a command like LDA #>$D020-$23 - it evaluates down to $CF, which is in range for a Load Immediate instruction.

These operations work inside of the .byte and .word pragmas as well, so they may (among other things) be used to store values in nonstandard formats. For example, the Amiga stores its addresses high-byte first (the big-endian format), and some emulation formats (such as PSID) require a number of high-byte-first addresses as arguments. You can represent a big-endian address in P65 by writing
.byte >addr, <addr
Which will store the value of the address of label addr.

Anonymous labels

Programs with complicated control flow (many loops and conditionals) frequently suffer from namespace pollution: you need to define a great many labels that are not of any global importance, and don't really deserve names. P65 addresses this problem with a facility for anonymous labels.

Anonymous labels are just what they sound like; labels with no names. To define an anonymous label, put an asterisk ("*") where a label would normally go. No colon is necessary with anonymous labels.

To use anonymous labels, you use a sequence of + or - signs. These are treated as special labels. The label - refers to the immediately previous anonymous label, -- two labels back, and so on. Likewise, the label + refers to the next anonymous label, and ++ the one after that, and so on.

Anonymous labels can be used in complex arguments, as well, but while the P65 parser is smart enough to realize that +++3 is the address of the temorary label after the next, plus 3, the humans reading the code probably would like you much better if you used an ordinary label instead. Use common sense.

Memory management

In the process of assembly, P65 needs to convert the symbolic labels that you use in your code into numbers - the actual memory addresses. In order to do this, it keeps a program counter that tells it where it is in memory. This lets it assign correct values to labels, and gives it a proper jumping-off point for relative instructions.

Although the 6502 has a flat memory architecture, P65 supports segmentation simulation. In effect, each segment has its own program counter, maintained independently of the others. Any given segment, when defined, starts at zero, and then increases as the program unfolds. The program counter may be manipulated through several directives.

Setting the program counter

The most basic directive is .org. This takes one argument, and sets the program counter to that argument. This doesn't actually do anything to the binary (except change the value of subsequent labels), so using .org in a chunk of code that the software you are targeting expects to be contiguous will break your binary. Typically, .org will be used at least once at the top of the file to set the "origin" of the code; what memory location the data will find itself loaded into.

To set the program counter within a contiguous block of memory, use .advance. This is like .org, except that it pads the file with zeros until the program counter matches the argument. Attempting to .advance backwards flags an error. (Attempting to .org backwards is legal, and sometimes even what you want - see the "Linking files" section for details.)

Generally, the arguments passed to .org and .advance will be simple numbers; however, it is legal to use labels as well, as long as the label is defined somewhere earlier in the file. This ensures that label definitions don't end up being defined in terms of themselves.

Organizing the symbol table

The symbol table is the data structure inside the assembler that maps labels to memory locations. Not all of these need to be labels in your program code; labels can also point at the eventual site of variables elsewhere in memory, or to predefined library routines, or to specific hardware registers. The basic tool for preparing these extra labels is .alias.

.alias takes two arguments; a label name and an expression. It evaluates the expression, and then constructs a new label in the symbol table with that value. As with .org and .advance, the expression may not refer to any label that has not already been defined.

It is possible to work out the global variables by hand, and then put at the top (or bottom) of the code the memory locations:

.alias var1 $C000   ; byte
.alias var2 $C001   ; word
.alias var3 $C003   ; 8-byte array
.alias var4 $C00B   ; word

However, it is safer to not hard-code them, for if a variable's size changes, or if you decide to relocate the data to another block of memory, then all of the variables must be recomputed. It is better to do computation on the program counter, like this:

.org $C000         ; Make sure this is before or after all the other code!
var1: .org ^+1
var2: .org ^+2
var3: .org ^+8
var4: .org ^+2

The lbl: .org ^+n construct is useful enough, but opaque enough, that a seperate directive - .space - provides a shorthand. The sample code above then becomes:

.org $C000         ; Make sure this is before or after all the other code!
.space var1 1
.space var2 2
.space var3 8
.space var4 2

Again, because this involves use of the .org directive, it should not be used inside of a contiguous block of code. Put it before the main code's .org statement, or at the very end of the source. Never put it anywhere inside code that's supposed to be .included inside something else, as those source files don't necessarily touch the "beginning" or "end" of the full source. .alias is safe everywhere.

Working with segments

An alternative to putting all your .space directives at the beginning or end of the program is to use data and text segments. The basic directives for switching between a data segment (which generally should only have .org or .space commands) and a text segment (which contains the program text) are .data and .text. Each segment needs its own .org statement at the beginning.

If your program file does not specify a segment, it starts in the text segment.

If you need to have multiple regions of memory (say, a zero page, a general data section, and program text) you can use the .segment directive. To use it, you choose a name for the segment (rules are the same as creating labels) and you may create any number of unique segments with these directives. To create a special segment for the zero page, for instance, one would switch to it by saying .segment zp.

.text and .data are really shorthand for .segment text and .segment data.

Preventing data overruns

If you've got multiple sections of memory that are supposed to be independent, overuse of the .space directive may cause the regions to overlap. This is usually a Bad Thing.

The .checkpc directive helps check for this condition. .checkpc takes an address as an argument, much like .org or .advance. However, instead of changing the program counter, it ensures that the program counter has not exceeded its value. For example, if you have a zero-page segment, and want to ensure that it does not overwrite the stack, end it with a .checkpc $100 statement.

File management

Most significant assembler projects will span multiple files. P65 includes two techniques for combining files.

The most basic is the .include directive. Typing .include "foo.p65" will, for all practical purposes, cut-and-paste the entirety of foo.p65 into that point in your code. (It doesn't quite do that, as that would mess up error reporting. Lines are tracked by line number and file name.)

The .link directive works like .include, but takes a (typically numerical) address as an additional argument. It then includes the file, but first it sets the program counter to be equal to the second argument. This directive is normally used for organizing a complex file that does not represent a contiguous piece of memory.

.link is technically a redundant operation: it is precisely equivalent to following an .org with an .include. However, linking files look cleaner with it.

Putting it all together

In this final section, we demonstrate how to orchestrate the subsidiary information to produce a runnable file on two very different platforms.

Commodore 64 - the .PRG format

The Commodore 64 system had a very simple file format. The first two bytes indicated where in memory the rest of the file should load, and the rest of the file was one continuous block.

BASIC programs load at location $0800. By writing a one-line BASIC program that jumps to a machine language routine, and duplicating it at the beginning of the file, we now have sufficient information to have a program that can be loaded with the LOAD command and then executed with the RUN command.

The C-64 is almost all RAM, so we don't really need to keep our data away from the program code. However, just for practice, let's put our variables in the upper memory area, between the BASIC ROM and the I/O ports.

.word $0800
.org $0800
.byte $00,$0c,$08,$0a,$00,$9e,$20,$32,$30,$36,$34,$00,$00,$00,$ff,$ff 
.data
.org $C000
.text

The sixteen bytes in the .byte command correspond to the basic program 10 SYS 2064. This is location $0810, which is also where the program counter is pointing at the end of the header. To use the header, simply .include it at the top of your code, and follow it immediately with your main program routine.

A more complex example: an NES cartridge

The 8-bit Nintendo Entertainment system uses cartridges with ROM chips on them to store its programs. The current standard for representing these on emulators is the so-called iNES format. The iNES format includes in its header a "mapper number" which indicates what scheme the program used to control bankswitching.

The NES also had two entirely seperate address spaces - one for the program (PRG-ROM), and one for graphics (CHR-ROM).

For our example, we will consider Mapper #2. This mapper did not have any CHR-ROM, but typically had four banks of PRG-ROM. Each bank was 16 kilobytes in size, for a total of 128KB of code. The last bank is hardwired into location $C000, and the other three could be bankswitched into location $8000. The iNES format constrains all of these banks to be precisely 16 kilobytes in size.

Since the program is stored on a ROM chip, the data will have to be in a seperate area. In fact, it will have to be in two seperate areas, because we want to use the zero page as much as we can, and we don't want to smash the stack in page 1. Therefore, we define the data segment to start at location $200, and an additional segment to start at location $0. We'll use .checkpc to ensure we don't smash the stack (hit page 1) or run out of RAM (go past $7FF).

The linking file would thus look something like this:

.include "iNESheader.p65"
.segment zp
.org $0
.data
.org $200
.text
.link "lowbank0.p65" $8000
.advance $C000
.link "lowbank1.p65" $8000
.advance $C000
.link "lowbank2.p65" $8000
.advance $C000
.link "highbank.p65" $C000
.advance $10000
.segment zp
.checkpc $100
.data
.checkpc $800

The .advance statements ensure that the final ROM blocks are the correct size.


Back to P65 Main Page