README

NAME:
	psim - Yet another MIPS processor simulator.

SYNOPSIS:
	psim (-c N|-n NxM) [-m MEMSZ] program

DESCRIPTION:
This simulator runs a subset of the little endian MIPS32 (MIPSEL)
specification. Full compliance to the specification is a non-goal, in fact it
only implements enough instructions to run the examples on the development
machine (that uses GCC 10.2). It does not implement caches and any type of
memory management. It does not have a delay slot. It does not implements
interruptions of any kind.

This simulator is intended to be a multiprocessor simulator and have two modes
of operation:
	* A shared memory mode;
	* A networked mode, with distributed memory. By network,
	  understand it as a NxM 2D mesh, where each node is a processor,
	  each node having private memory and communication links
	  which uses to communicate with its neighbors. This behavior
	  is to simulate how the transputer network capabilities were
	  initially conceived.

The objective of this simulator is to compare the performance of equivalent
programs built specifically to run on both modes supported.

OPTIONS:
-c N		Run in shared memory mode with N processors.
-n NxM		Run in network mode with a matrix of NxM processors.
-m MEMSZ	Set size (in bytes) of the simulation's memory
		(default: 16MiB).

EXAMPLES:
To run the examples conveniently:
	`examples/helpers/run_exp -h`

Probably the variable `REPOBASEDIR` will have to be set correctly, the easiest
way to do so is to run from the root of the repository:
	`REPOBASEDIR=. examples/helpers/run_exp -p TARGET`

TODO:
Multi-threaded simulation;
Better program loading in the networked mode;

NOTES:
To build the simulator:
	* Build dependencies: c99 compiler, POSIX compatible libc, elf.h
	  (installed via libelf on Linux) and err.h (installed via libbsd
	  on Linux);
	* check config.mk and, if needed, set compiler and flags to your
	  liking;
	* make

To build the examples:
	* Build dependencies: MIPSEL capable assembler, compiler and linker.
		To be fair, the assembly files were written for GNU as(1),
		nonetheless, it should be easy to port to another assembler,
		given that no fancy features were used. Anecdotal evidence
		points that GNU as(1) is the most available option, so in
		the end, this is a non-issue.
	* cd examples/src
	* check config.mk and, if needed, set compiler and flags to your liking
	* make
		It is possible to build individual programs by running
		`make PROGRAM_NAME`, where PROGRAM_NAME is the program's file
		name excluding the '.c'

Although it can simulate multiple processors, the simulator itself runs on
a single thread. This makes easier to debug the examples (the simulation is
deterministic, giving the same result every time) at the cost of simulation
speed.

All processors have their IDs set on the beginning of the simulation, and this
value is available on the K0 register.

All instructions besides memory accesses and network communication costs one
cycle.

Memory access instructions costs 1 cycle if running a simulation in network
mode. When running a simulation of N processors in the shared memory mode, the
memory can be accessed once per cycle, so bus arbitration will defer memory
accesses of multiple processors, meaning that a memory access can cost between
1 up to N cycles.

Network communications instructions were implemented using the co-processor 2
operations. If in shared memory mode, executing this instructions will result
in deadlock. In network mode an input or alt operation will cost 1 + number
of cycles waiting for the message to arrive. An output operation will cost 2
* number of hops to send the message + cycles waiting for the correspondent
input command to execute. These operations differ from what was implemented
on the transputer because instead of using channels, the simulator implements
communication by naming the processors involved directly, this makes it closer
to how CSP was initially defined in 1978.

If the INSTRDUMP macro is set when compiling (it is not by default) files
called cpuXXXX_instrdump will be created when the program is executed. To check
these files, a MIPSEL capable disassembler is needed (it is possible to use
hexdump(1) but this will make the debugging process even worse). For example,
using GNU objdump(1):
	`objdump -b binary -m mips -EL -Mno-aliases -D cpuXXXX_instrdump`

If the VERBOSE macro is set when compiling (it is not by default) the simulator
will print additional warning messages to stderr, maybe useful for debugging.

At the end of the simulation, a file called perfct_PROGRAM_NAME.csv will be
created at the current directory, containing a number of performance counters.

Program loading on the network mode is quite dumb and wasteful, since it loads
every processor in the the network with the same program.

Always remember to use the correct number of simulated processors while
running the examples, otherwise the simulation can hang forever.

If you read Portuguese and want to check my bachelor thesis for
my analysis of both modes of multiprocessing, it is available at
<https://dalmon.xyz/tg_dalmon.pdf>. Otherwise, to sum it up, the CSP-like mode
of multiprocessing is promising, but I made a mistake, my implementations
are naive (i.e they are semantically right, but performance-wise they are
not great). For the producer-consumer problem, CSP-like multiprocessing gave
faster execution times and less code than the shared memory version. For the
dining philosophers problem, CSP-like multiprocessing is slower and the code
is bigger than the shared memory version.