Skip to content

Commit

Permalink
Update to ares v120 release.
Browse files Browse the repository at this point in the history
This release provides improved Nintendo 64 and Mega Drive emulation, plus some
emulation speedups and new features like keyboard-mapping for all Nintendo 64
controls.

The infamous Titan Overdrive demos now run with the exception of one glitched
screen each, but you will need to compile with profile=accuracy to enable the
required cycle-based VDP renderer for this to work. I'll try in the future to
support as much of it as I can with the default scanline renderer as well.

Changelog (since v119):

  - SH2: added support for Windows ABI to the recompiler
  - SH2: MAC must increment R[n] before reading from R[m]; fixes Virtua
    Fighter and Toughman Contest
  - SH2: fixed dynarec MOV @rm+,Rn to not increment when Rm==Rn
  - SH2: fixed dynarec TST instruction
  - SH2: improved dynarec accuracy by breaking blocks on delay slot branches
  - SH2: improved dynarec accuracy by decreasing the underclocking amount
  - Mega Drive: implemented undocumented VSRAM and CRAM DMA fill
  - Mega Drive: simplified scanline VDP renderer; fixes TMNT: Tournament
    Fighters graphics
  - Mega Drive: improved region detection; correctly identifies Alien Soldier
    region now
  - Mega CD: fixed crash on game load
  - Super Famicom: fixed direct color mode; fixes Secret of Mana world map
  - WonderSwan Color: fixed initial state for new EEPROMs plus EEPROM size;
    fixes missing sound
  - lucia: fixed saving RAM files when a manual save path was specified
  - ruby: fixed library dependency detection for Arch Linux and other
    distributions
  - SH2: additional correction for dynarec MOV @rm+,Rn instructions
  - hiro/GTK3: added CSS stylesheet overrides to improve appearance [Screwtape]
  - N64: VMRG was not clearing VCO in C++ version [Rasky]
  - N64: fixed "ctc2" in the disassembler
  - Mega Drive: VDP address/command bits are set even for non-register writes
    [Eke]
  - Mega Drive: implemented VDP FIFO with approximated timings
  - Mega Drive: implemented CPU bus arbitrator
  - N64: fixed RSP vector unit register values in the disassembler
  - Mega Drive: implemented proper VDP FIFO timings for both reads and writes
    (hopefully)
  - Mega Drive: enabled external and RAM refresh timing
  - Mega Drive: added (Mega Drive | Mega 32X) + Mega CD mode 1 emulation
  - Mega Drive: refactored dot-based VDP renderer
  - ares: added Thread::restart() function to reset a thread without resetting
    its clock
  - Mega Drive: improved DRAM refresh timings
  - Mega Drive: added VDP I/O logger
  - Mega Drive: fixed VDP counter emulation (still inaccurate)
  - Mega Drive: improved VDP DMA and FIFO emulation
  - Nintendo 64: report 64DD as missing for now; fixes F-Zero X
  - Nintendo 64: improved TLB emulation; fixes Conker's Bad Fur Day
  - Nintendo 64: PI DMA from flash always reads data, never the status;
    fixes Paper Mario
  - Mega Drive: improved VDP FIFO emulation
  - Mega Drive: improved VDP dot-renderer
  - nall/bit-range: fixed bug with bit indexes >= 32
  - Mega Drive: fixed VDP read buffer indexing for VSRAM and CRAM
  - Mega Drive: fixed 68K to VDP DMA so that it instantly freezes the CPU
  - Mega Drive: emulated VDP left window hardware glitch where hscroll&15!=0
  - Mega Drive: improved CPU interrupt handling
  - Mega Drive: added APU bus mirrorings
  - Mega Drive: emulated the VDP debug register
  - Mega Drive: fixed VDP VRAM DMA copy
  - Mega Drive: improved 128KB VRAM mode support
  - Mega Drive: mask sprite attribute table address in H40 mode [Sik]
  - Mega Drive: VDP timing improvements
  - Nintendo 64: improved EEPROM support; fixes Perfect Dark
  - Nintendo 64: improved VI interrupt support; fixes Star Wars: Rogue Squadron
    title screen [nodev]
  - Nintendo 64: emulated CIC-NUS-6105 copy protection; fixes Banjo-Tooie
    [XScale]
  - Nintendo 64: added serrate (interlace) support and fixed bug when
    supersampling in serrate mode
  - Nintendo 64: emulated CPU instruction cache
  - Nintendo 64: emulated CPU data cache
  - Nintendo 64: emulated CPU CACHE instruction
  - Nintendo 64: fixed CPU TLB bug; fixes GoldenEye
  - Nintendo 64: began adapting CPU cached interpreter into a dynamic
    recompiler
  - Nintendo 64: began adapting RSP cached interpreter into a dynamic
    recompiler
  - Nintendo 64: added 64-bit addressing and TLB supporting
  - Nintendo 64: added endian support to [LS][WD][LR] instructions
  - Nintendo 64: improved dynamic recompiler
  - PlayStation: began adapting CPU cached interpreter into a dynamic
    recompiler
  - lucia: allow mapping analog axes separately (allows mapping sticks to
    the keyboard)
  - Nintendo 64: corrected analog stick ranges
  - Nintendo 64: emulate the analog stick as an octagon rather than a circle
    [YetAnotherEmuDev]

[This is actually the second release of v120; the change since the first release
is the N64 analog stick ranges and the octagon gate. -Ed.]
  • Loading branch information
near-san committed May 6, 2021
1 parent 1b574d0 commit 340f48d
Show file tree
Hide file tree
Showing 14 changed files with 188 additions and 80 deletions.
4 changes: 2 additions & 2 deletions ares/ares/ares.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ using namespace nall;

namespace ares {
static const string Name = "ares";
static const string Version = "119.16";
static const string Version = "120";
static const string Copyright = "Near";
static const string License = "CC BY-NC-ND 4.0";
static const string LicenseURI = "https://creativecommons.org/licenses/by-nc-nd/4.0/";
Expand All @@ -45,7 +45,7 @@ namespace ares {

//incremented only when serialization format changes
static const u32 SerializerSignature = 0x31545342; //"BST1" (little-endian)
static const string SerializerVersion = "119.2";
static const string SerializerVersion = "120";

namespace VFS {
using Pak = shared_pointer<vfs::directory>;
Expand Down
59 changes: 40 additions & 19 deletions ares/component/processor/sh2/recompiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,12 @@ auto SH2::Recompiler::emit(u32 address) -> Block* {
bind({block->code, allocator.available()});
push(rbx);
push(rbp);
sub(rsp, imm8(16));
push(r13);
if constexpr(abi() == ABI::Windows) {
push(rsi);
push(rdi);
sub(rsp, imm8(0x40));
}
mov(rbx, imm64(&self.R[0]));
mov(rbp, imm64(&self));

Expand All @@ -57,13 +62,26 @@ auto SH2::Recompiler::emit(u32 address) -> Block* {
if(hasBranched || (address & 0xfe) == 0) break; //block boundary
hasBranched = branched;
test(rax, rax);
jz(imm8(7));
add(rsp, imm8(16));
if constexpr(abi() == ABI::SystemV) {
jz(imm8(5));
}
if constexpr(abi() == ABI::Windows) {
jz(imm8(11));
add(rsp, imm8(0x40));
pop(rdi);
pop(rsi);
}
pop(r13);
pop(rbp);
pop(rbx);
ret();
}
add(rsp, imm8(16));
if constexpr(abi() == ABI::Windows) {
add(rsp, imm8(0x40));
pop(rdi);
pop(rsi);
}
pop(r13);
pop(rbp);
pop(rbx);
ret();
Expand All @@ -80,21 +98,6 @@ auto SH2::Recompiler::emit(u32 address) -> Block* {
#define writeLong &SH2::writeLong
#define illegal &SH2::illegalInstruction

template<typename V, typename... P>
auto SH2::Recompiler::call(V (SH2::*function)(P...)) -> void {
#if defined(PLATFORM_WINDOWS)
mov(rcx, rbp);
mov(r8, rdx);
mov(rdx, rsi);
mov(rax, imm64(function));
call(rax);
#else
mov(rdi, rbp);
mov(rax, imm64(function));
call(rax);
#endif
}

auto SH2::Recompiler::emitInstruction(u16 opcode) -> bool {
#define n (opcode >> 8 & 0x00f)
#define m (opcode >> 4 & 0x00f)
Expand Down Expand Up @@ -1710,3 +1713,21 @@ auto SH2::Recompiler::emitInstruction(u16 opcode) -> bool {
#undef writeWord
#undef writeLong
#undef illegal

template<typename V, typename... P>
auto SH2::Recompiler::call(V (SH2::*function)(P...)) -> void {
static_assert(sizeof...(P) <= 5);
mov(rax, imm64(function));
if constexpr(abi() == ABI::SystemV) {
mov(rdi, rbp);
}
if constexpr(abi() == ABI::Windows) {
if constexpr(sizeof...(P) >= 5) mov(dis8(rsp, 0x28), r9);
if constexpr(sizeof...(P) >= 4) mov(dis8(rsp, 0x20), r8);
if constexpr(sizeof...(P) >= 3) mov(r9, rcx);
if constexpr(sizeof...(P) >= 2) mov(r8, rdx);
if constexpr(sizeof...(P) >= 1) mov(rdx, rsi);
mov(rcx, rbp);
}
call(rax);
}
3 changes: 2 additions & 1 deletion ares/component/processor/sh2/sh2.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -272,9 +272,10 @@ struct SH2 {
auto pool(u32 address) -> Pool*;
auto block(u32 address) -> Block*;
auto emit(u32 address) -> Block*;
template<typename R, typename... P> auto call(R (SH2::*function)(P...)) -> void;
auto emitInstruction(u16 opcode) -> bool;

template<typename R, typename... P> auto call(R (SH2::*function)(P...)) -> void;

bump_allocator allocator;
Pool* pools[1 << 24];
} recompiler{*this};
Expand Down
24 changes: 16 additions & 8 deletions ares/n64/controller/gamepad/gamepad.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -96,17 +96,25 @@ auto Gamepad::read() -> n32 {
platform->input(z);
platform->input(start);

//16-bit signed -> 8-bit signed
auto ay = sclamp<8>(-y->value() >> 8);
auto ax = sclamp<8>(+x->value() >> 8);
//scale {-32768 ... +32767} to {-84 ... +84}
auto ax = x->value() * 85.0 / 32767.0;
auto ay = y->value() * 85.0 / 32767.0;

//dead-zone
if(abs(ay) < 24) ay = 0;
if(abs(ax) < 24) ax = 0;
//bound diagonals to an octagonal range {-68 ... +68}
if(ax != 0.0 && ay != 0.0) {
auto slope = ay / ax;
ax = copysign(min(abs(ax), 85.0 / (abs(slope) + 16.0 / 69.0)), ax);
ay = copysign(min(abs(ax * slope), 85.0 / (1.0 / abs(slope) + 16.0 / 69.0)), ay);
ax = ay / slope;
}

//create dead-zone in range {-15 ... +15}
if(abs(ax) < 16) ax = 0;
if(abs(ay) < 16) ay = 0;

n32 data;
data.byte(0) = ay;
data.byte(1) = ax;
data.byte(0) = -ay;
data.byte(1) = +ax;
data.bit(16) = cameraRight->value();
data.bit(17) = cameraLeft->value();
data.bit(18) = cameraDown->value();
Expand Down
3 changes: 3 additions & 0 deletions ares/n64/cpu/context.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ auto CPU::Context::setMode() -> void {
break;
}

//todo: 64-bit mode breaks libdragon software, so disable it for now ...
bits = 32;

if(bits == 32) {
segment[0] = Segment::Mapped32;
segment[1] = Segment::Mapped32;
Expand Down
44 changes: 32 additions & 12 deletions ares/n64/cpu/recompiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ auto CPU::Recompiler::emit(u32 address) -> Block* {
push(rbx);
push(rbp);
push(r13);
if constexpr(abi() == ABI::Windows) {
push(rsi);
push(rdi);
sub(rsp, imm8(0x40));
}
mov(rbx, imm64(&self.ipu.r[0] + 16));
mov(rbp, imm64(&self));
mov(r13, imm64(&self.fpu.r[0] + 16));
Expand All @@ -37,17 +42,30 @@ auto CPU::Recompiler::emit(u32 address) -> Block* {
add(rax, imm8(64));
mov(mem64(&self.clock), rax);
}
call(&CPU::instructionEpilogue, &self);
call(&CPU::instructionEpilogue);
address += 4;
if(hasBranched || (address & 0xfc) == 0) break; //block boundary
hasBranched = branched;
test(rax, rax);
jz(imm8(5));
if constexpr(abi() == ABI::SystemV) {
jz(imm8(5));
}
if constexpr(abi() == ABI::Windows) {
jz(imm8(11));
add(rsp, imm8(0x40));
pop(rdi);
pop(rsi);
}
pop(r13);
pop(rbp);
pop(rbx);
ret();
}
if constexpr(abi() == ABI::Windows) {
add(rsp, imm8(0x40));
pop(rdi);
pop(rsi);
}
pop(r13);
pop(rbp);
pop(rbx);
Expand Down Expand Up @@ -2042,16 +2060,18 @@ auto CPU::Recompiler::emitFPU(u32 instruction) -> bool {

template<typename V, typename... P>
auto CPU::Recompiler::call(V (CPU::*function)(P...)) -> void {
#if defined(PLATFORM_WINDOWS)
mov(r8, rdx);
mov(r9, rcx);
mov(rdx, rsi);
mov(rcx, rbp);
mov(rax, imm64(function));
call(rax);
#else
mov(rdi, rbp);
static_assert(sizeof...(P) <= 5);
mov(rax, imm64(function));
if constexpr(abi() == ABI::SystemV) {
mov(rdi, rbp);
}
if constexpr(abi() == ABI::Windows) {
if constexpr(sizeof...(P) >= 5) mov(dis8(rsp, 0x28), r9);
if constexpr(sizeof...(P) >= 4) mov(dis8(rsp, 0x20), r8);
if constexpr(sizeof...(P) >= 3) mov(r9, rcx);
if constexpr(sizeof...(P) >= 2) mov(r8, rdx);
if constexpr(sizeof...(P) >= 1) mov(rdx, rsi);
mov(rcx, rbp);
}
call(rax);
#endif
}
3 changes: 2 additions & 1 deletion ares/n64/rdram/rdram.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ auto RDRAM::unload() -> void {

auto RDRAM::power(bool reset) -> void {
ram.fill();
//hacks needed for expansion pak detection:
//the PIF ROM RDRAM self-test is not working yet,
//so this hack is needed for expansion pak detection:
ram.writeWord(0x318, ram.size); //CIC-NUS-6102
ram.writeWord(0x3f0, ram.size); //CIC-NUS-6105
io = {};
Expand Down
16 changes: 8 additions & 8 deletions ares/n64/ri/ri.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,14 @@ struct RI : Memory::IO<RI> {
auto serialize(serializer&) -> void;

struct IO {
u32 mode = 0x0e;
u32 config = 0x40;
u32 currentLoad = 0x00;
u32 select = 0x14;
u32 refresh = 0x0006'3634;
u32 latency = 0;
u32 readError = 0;
u32 writeError = 0;
n32 mode = 0x0e;
n32 config = 0x40;
n32 currentLoad = 0x00;
n32 select = 0x14;
n32 refresh = 0x0006'3634;
n32 latency = 0;
n32 readError = 0;
n32 writeError = 0;
} io;
};

Expand Down
42 changes: 31 additions & 11 deletions ares/n64/rsp/recompiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ auto RSP::Recompiler::emit(u32 address) -> Block* {
push(rbx);
push(rbp);
push(r13);
if constexpr(abi() == ABI::Windows) {
push(rsi);
push(rdi);
sub(rsp, imm8(0x40));
}
mov(rbx, imm64(&self.ipu.r[0]));
mov(rbp, imm64(&self));
mov(r13, imm64(&self.vpu.r[0]));
Expand All @@ -56,12 +61,25 @@ auto RSP::Recompiler::emit(u32 address) -> Block* {
if(hasBranched || (address & 0xffc) == 0) break; //IMEM boundary
hasBranched = branched;
test(rax, rax);
jz(imm8(5));
if constexpr(abi() == ABI::SystemV) {
jz(imm8(5));
}
if constexpr(abi() == ABI::Windows) {
jz(imm8(11));
add(rsp, imm8(0x40));
pop(rdi);
pop(rsi);
}
pop(r13);
pop(rbp);
pop(rbx);
ret();
}
if constexpr(abi() == ABI::Windows) {
add(rsp, imm8(0x40));
pop(rdi);
pop(rsi);
}
pop(r13);
pop(rbp);
pop(rbx);
Expand Down Expand Up @@ -1428,16 +1446,18 @@ auto RSP::Recompiler::emitSWC2(u32 instruction) -> bool {

template<typename V, typename... P>
auto RSP::Recompiler::call(V (RSP::*function)(P...)) -> void {
#if defined(PLATFORM_WINDOWS)
mov(r8, rdx);
mov(r9, rcx);
mov(rdx, rsi);
mov(rcx, rbp);
mov(rax, imm64(function));
call(rax);
#else
mov(rdi, rbp);
static_assert(sizeof...(P) <= 5);
mov(rax, imm64(function));
if constexpr(abi() == ABI::SystemV) {
mov(rdi, rbp);
}
if constexpr(abi() == ABI::Windows) {
if constexpr(sizeof...(P) >= 5) mov(dis8(rsp, 0x28), r9);
if constexpr(sizeof...(P) >= 4) mov(dis8(rsp, 0x20), r8);
if constexpr(sizeof...(P) >= 3) mov(r9, rcx);
if constexpr(sizeof...(P) >= 2) mov(r8, rdx);
if constexpr(sizeof...(P) >= 1) mov(rdx, rsi);
mov(rcx, rbp);
}
call(rax);
#endif
}
8 changes: 5 additions & 3 deletions ares/n64/si/si.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -95,10 +95,12 @@ auto SI::scan() -> void {

n3 channel = 0;
for(u32 offset = 0; offset < 64;) {
n6 send = pi.ram.readByte(offset++);
n8 send = pi.ram.readByte(offset++);
if(send == 0x00) { channel++; continue; }
if(send == 0x3e) break;
if(send == 0x3f) continue;
if(send == 0xfd) continue; //channel reset
if(send == 0xfe) break; //end of packets
if(send == 0xff) continue; //alignment padding
send &= 0x3f;
n8 recvOffset = offset;
n6 recv = pi.ram.readByte(offset++);
n8 input[64];
Expand Down
Loading

0 comments on commit 340f48d

Please sign in to comment.