Skip to content

Commit

Permalink
Use a lookup table for return opcode
Browse files Browse the repository at this point in the history
Summary:
Instead of generating a series of branches and relying on the compiler
to do something efficient, we can eliminate the branches entirely by
using a lookup table packed into a `uint32_t`. While this is slightly
more expensive in terms of instructions, it produces no branches and
results in a net perf win.

Reviewed By: tmikov

Differential Revision: D32229103

fbshipit-source-id: 9a332e51d0d910c41e4ee07e3146efbc61618a12
  • Loading branch information
neildhar authored and facebook-github-bot committed Nov 9, 2021
1 parent afe1dd1 commit 2219f4c
Showing 1 changed file with 39 additions and 24 deletions.
63 changes: 39 additions & 24 deletions lib/VM/Interpreter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -749,34 +749,49 @@ LLVM_ATTRIBUTE_ALWAYS_INLINE
static inline const Inst *nextInstCall(const Inst *ip) {
HERMES_SLOW_ASSERT(isCallType(ip->opCode) && "ip is not of call type");

// The following is written to elicit compares instead of table lookup.
// The idea is to present code like so:
// if (opcode <= 70) return ip + 4;
// if (opcode <= 71) return ip + 4;
// if (opcode <= 72) return ip + 4;
// if (opcode <= 73) return ip + 5;
// if (opcode <= 74) return ip + 5;
// ...
// and the compiler will retain only compares where the result changes (here,
// 72 and 74). This allows us to compute the next instruction using three
// compares, instead of a naive compare-per-call type (or lookup table).
//
// Statically verify that increasing call opcodes correspond to monotone
// instruction sizes; this enables the compiler to do a better job optimizing.
constexpr bool callSizesMonotoneIncreasing = monotoneIncreasing(
// To avoid needing a large lookup table or switchcase, the following packs
// information about the size of each call opcode into a uint32_t. Each call
// type is represented with two bits, representing how much larger it is than
// the smallest call instruction.
// If we used 64 bits, we could fit the actual size of each call, without
// needing the offset, and this may be necessary if new call instructions are
// added in the future. For now however, due to limitations on loading large
// immediates in ARM, it is significantly more efficient to use a uint32_t
// than a uint64_t.
constexpr auto firstCall = std::min({
#define DEFINE_RET_TARGET(name) static_cast<uint8_t>(OpCode::name),
#include "hermes/BCGen/HBC/BytecodeList.def"
});
constexpr auto lastCall = std::max({
#define DEFINE_RET_TARGET(name) static_cast<uint8_t>(OpCode::name),
#include "hermes/BCGen/HBC/BytecodeList.def"
});
constexpr auto minSize = std::min({
#define DEFINE_RET_TARGET(name) sizeof(inst::name##Inst),
#include "hermes/BCGen/HBC/BytecodeList.def"
SIZE_MAX // sentinel avoiding a trailing comma.
);
static_assert(
callSizesMonotoneIncreasing,
"Call instruction sizes are not monotone increasing");
});
constexpr auto maxSize = std::max({
#define DEFINE_RET_TARGET(name) sizeof(inst::name##Inst),
#include "hermes/BCGen/HBC/BytecodeList.def"
});

constexpr uint32_t W = 2;
constexpr uint32_t mask = (1 << W) - 1;

#define DEFINE_RET_TARGET(name) \
if (ip->opCode <= OpCode::name) \
return NEXTINST(name);
static_assert(llvh::isUInt<W>(maxSize - minSize), "Size range too large.");
static_assert((lastCall - firstCall + 1) * W <= 32, "Too many call opcodes.");

constexpr uint32_t callSizes = 0
#define DEFINE_RET_TARGET(name) \
| \
((sizeof(inst::name##Inst) - minSize) \
<< (((uint8_t)OpCode::name - firstCall) * W))
#include "hermes/BCGen/HBC/BytecodeList.def"
llvm_unreachable("Not a call type");
;
#undef DEFINE_RET_TARGET

const uint8_t offset = static_cast<uint8_t>(ip->opCode) - firstCall;
return IPADD(((callSizes >> (offset * W)) & mask) + minSize);
}

CallResult<HermesValue> Runtime::interpretFunctionImpl(
Expand Down

0 comments on commit 2219f4c

Please sign in to comment.