Skip to content

Commit

Permalink
Add hs_try_putmvar()
Browse files Browse the repository at this point in the history
Summary:
This is a fast, non-blocking, asynchronous, interface to tryPutMVar that
can be called from C/C++.

It's useful for callback-based C/C++ APIs: the idea is that the callback
invokes hs_try_putmvar(), and the Haskell code waits for the callback to
run by blocking in takeMVar.

The callback doesn't block - this is often a requirement of
callback-based APIs.  The callback wakes up the Haskell thread with
minimal overhead and no unnecessary context-switches.

There are a couple of benchmarks in
testsuite/tests/concurrent/should_run.  Some example results comparing
hs_try_putmvar() with using a standard foreign export:

    ./hs_try_putmvar003 1 64 16 100 +RTS -s -N4     0.49s
    ./hs_try_putmvar003 2 64 16 100 +RTS -s -N4     2.30s

hs_try_putmvar() is 4x faster for this workload (see the source for
hs_try_putmvar003.hs for details of the workload).

An alternative solution is to use the IO Manager for this.  We've tried
it, but there are problems with that approach:
* Need to create a new file descriptor for each callback
* The IO Manger thread(s) become a bottleneck
* More potential for things to go wrong, e.g. throwing an exception in
  an IO Manager callback kills the IO Manager thread.

Test Plan: validate; new unit tests

Reviewers: niteria, erikd, ezyang, bgamari, austin, hvr

Subscribers: thomie

Differential Revision: https://phabricator.haskell.org/D2501
  • Loading branch information
simonmar committed Sep 12, 2016
1 parent 0e7ccf6 commit 454033b
Show file tree
Hide file tree
Showing 24 changed files with 715 additions and 10 deletions.
119 changes: 119 additions & 0 deletions docs/users_guide/ffi-chap.rst
Original file line number Diff line number Diff line change
Expand Up @@ -616,6 +616,125 @@ the threads have exited first. (Unofficially, if you want to use this
fast and loose version of ``hs_exit()``, then call
``shutdownHaskellAndExit()`` instead).

.. _hs_try_putmvar:

Waking up Haskell threads from C
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Sometimes we want to be able to wake up a Haskell thread from some C
code. For example, when using a callback-based C API, we register a C
callback and then we need to wait for the callback to run.

One way to do this is to create a ``foreign export`` that will do
whatever needs to be done to wake up the Haskell thread - perhaps
``putMVar`` - and then call this from our C callback. There are a
couple of problems with this:

1. Calling a foreign export has a lot of overhead: it creates a
complete new Haskell thread, for example.
2. The call may block for a long time if a GC is in progress. We
can't use this method if the C API we're calling doesn't allow
blocking in the callback.

For these reasons GHC provides an external API to ``tryPutMVar``,
``hs_try_putmvar``, which you can use to cheaply and asynchronously
wake up a Haskell thread from C/C++.

.. code-block:: c
void hs_try_putmvar (int capability, HsStablePtr sp);
The C call ``hs_try_putmvar(cap, mvar)`` is equivalent to the Haskell
call ``tryPutMVar mvar ()``, except that it is

* non-blocking: takes a bounded, short, amount of time

* asynchronous: the actual putMVar may be performed after the call
returns (for example, if the RTS is currently garbage collecting).
That's why ``hs_try_putmvar()`` doesn't return a result to say
whether the put succeeded. It is your responsibility to ensure that
the ``MVar`` is empty; if it is full, ``hs_try_putmvar()`` will have
no effect.

**Example**. Suppose we have a C/C++ function to call that will return and then
invoke a callback at some point in the future, passing us some data.
We want to wait in Haskell for the callback to be called, and retrieve
the data. We can do it like this:

.. code-block:: haskell
import GHC.Conc (newStablePtrPrimMVar, PrimMVar)
makeExternalCall = mask_ $ do
mvar <- newEmptyMVar
sp <- newStablePtrPrimMVar mvar
fp <- mallocForeignPtr
withForeignPtr fp $ \presult -> do
cap <- threadCapability =<< myThreadId
scheduleCallback sp cap presult
takeMVar mvar `onException`
forkIO (do takeMVar mvar; touchForeignPtr fp)
peek presult
foreign import ccall "scheduleCallback"
scheduleCallback :: StablePtr PrimMVar
-> Int
-> Ptr Result
-> IO ()
And inside ``scheduleCallback``, we create a callback that will in due
course store the result data in the ``Ptr Result``, and then call
``hs_try_putmvar()``.

There are a few things to note here.

* There's a special function to create the ``StablePtr``:
``newStablePtrPrimMVar``, because the RTS needs a ``StablePtr`` to
the primitive ``MVar#`` object, and we can't create that directly.
Do *not* just use ``newStablePtr`` on the ``MVar``: your program
will crash.

* The ``StablePtr`` is freed by ``hs_try_putmvar()``. This is because
it would otherwise be difficult to arrange to free the ``StablePtr``
reliably: we can't free it in Haskell, because if the ``takeMVar``
is interrupted by an asynchronous exception, then the callback will
fire at a later time. We can't free it in C, because we don't know
when to free it (not when ``hs_try_putmvar()`` returns, because that
is an async call that uses the ``StablePtr`` at some time in the
future).

* The ``mask_`` is to avoid asynchronous exceptions before the
``scheduleCallback`` call, which would leak the ``StablePtr``.

* We find out the current capability number and pass it to C. This is
passed back to ``hs_try_putmvar``, and helps the RTS to know which
capability it should try to perform the ``tryPutMVar`` on. If you
don't care, you can pass ``-1`` for the capability to
``hs_try_putmvar``, and it will pick an arbitrary one.

Picking the right capability will help avoid unnecessary context
switches. Ideally you should pass the capability that the thread
that will be woken up last ran on, which you can find by calling
``threadCapability`` in Haskell.

* If you want to also pass some data back from the C callback to
Haskell, this is best done by first allocating some memory in
Haskell to receive the data, and passing the address to C, as we did
in the above example.

* ``takeMVar`` can be interrupted by an asynchronous exception. If
this happens, the callback in C will still run at some point in the
future, will still write the result, and will still call
``hs_try_putmvar()``. Therefore we have to arrange that the memory
for the result stays alive until the callback has run, so if an
exception is thrown during ``takeMVar`` we fork another thread to
wait for the callback and hold the memory alive using
``touchForeignPtr``.

For a fully working example, see
``testsuite/tests/concurrent/should_run/hs_try_putmvar001.hs`` in the
GHC source tree.

.. _ffi-floating-point:

Floating point and the FFI
Expand Down
4 changes: 4 additions & 0 deletions includes/HsFFI.h
Original file line number Diff line number Diff line change
Expand Up @@ -113,8 +113,12 @@ extern StgPtr hs_spt_lookup(StgWord64 key[2]);
extern int hs_spt_keys(StgPtr keys[], int szKeys);
extern int hs_spt_key_count (void);

extern void hs_try_putmvar (int capability, HsStablePtr sp);

/* -------------------------------------------------------------------------- */



#ifdef __cplusplus
}
#endif
Expand Down
2 changes: 2 additions & 0 deletions libraries/base/GHC/Conc.hs
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,8 @@ module GHC.Conc
, threadStatus
, threadCapability

, newStablePtrPrimMVar, PrimMVar

-- * Waiting
, threadDelay
, registerDelay
Expand Down
14 changes: 14 additions & 0 deletions libraries/base/GHC/Conc/Sync.hs
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,8 @@ module GHC.Conc.Sync
, threadStatus
, threadCapability

, newStablePtrPrimMVar, PrimMVar

-- * Allocation counter and quota
, setAllocationCounter
, getAllocationCounter
Expand Down Expand Up @@ -117,6 +119,7 @@ import GHC.MVar
import GHC.Ptr
import GHC.Real ( fromIntegral )
import GHC.Show ( Show(..), showString )
import GHC.Stable ( StablePtr(..) )
import GHC.Weak

infixr 0 `par`, `pseq`
Expand Down Expand Up @@ -615,6 +618,17 @@ mkWeakThreadId t@(ThreadId t#) = IO $ \s ->
(# s1, w #) -> (# s1, Weak w #)


data PrimMVar

-- | Make a StablePtr that can be passed to the C function
-- @hs_try_putmvar()@. The RTS wants a 'StablePtr' to the underlying
-- 'MVar#', but a 'StablePtr#' can only refer to lifted types, so we
-- have to cheat by coercing.
newStablePtrPrimMVar :: MVar () -> IO (StablePtr PrimMVar)
newStablePtrPrimMVar (MVar m) = IO $ \s0 ->
case makeStablePtr# (unsafeCoerce# m :: PrimMVar) s0 of
(# s1, sp #) -> (# s1, StablePtr sp #)

-----------------------------------------------------------------------------
-- Transactional heap operations
-----------------------------------------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions rts/Capability.c
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,7 @@ initCapability (Capability *cap, uint32_t i)
cap->returning_tasks_tl = NULL;
cap->n_returning_tasks = 0;
cap->inbox = (Message*)END_TSO_QUEUE;
cap->putMVars = NULL;
cap->sparks = allocSparkPool();
cap->spark_stats.created = 0;
cap->spark_stats.dud = 0;
Expand Down
13 changes: 12 additions & 1 deletion rts/Capability.h
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,7 @@ struct Capability_ {
// returning_tasks_{hd,tl}
// wakeup_queue
// inbox
// putMVars
Mutex lock;

// Tasks waiting to return from a foreign call, or waiting to make
Expand All @@ -138,6 +139,10 @@ struct Capability_ {
// Locks required: cap->lock
Message *inbox;

// putMVars are really messages, but they're allocated with malloc() so they
// can't go on the inbox queue: the GC would get confused.
struct PutMVar_ *putMVars;

SparkPool *sparks;

// Stats on spark creation/conversion
Expand Down Expand Up @@ -378,6 +383,11 @@ extern uint32_t numa_map[MAX_NUMA_NODES];
Messages
-------------------------------------------------------------------------- */

typedef struct PutMVar_ {
StgStablePtr mvar;
struct PutMVar_ *link;
} PutMVar;

#ifdef THREADED_RTS

INLINE_HEADER rtsBool emptyInbox(Capability *cap);
Expand Down Expand Up @@ -459,7 +469,8 @@ contextSwitchCapability (Capability *cap)

INLINE_HEADER rtsBool emptyInbox(Capability *cap)
{
return (cap->inbox == (Message*)END_TSO_QUEUE);
return (cap->inbox == (Message*)END_TSO_QUEUE &&
cap->putMVars == NULL);
}

#endif
Expand Down
2 changes: 2 additions & 0 deletions rts/Prelude.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
* modules these names are defined in.
*/

PRELUDE_CLOSURE(ghczmprim_GHCziTuple_Z0T_closure);
PRELUDE_CLOSURE(ghczmprim_GHCziTypes_True_closure);
PRELUDE_CLOSURE(ghczmprim_GHCziTypes_False_closure);
PRELUDE_CLOSURE(base_GHCziPack_unpackCString_closure);
Expand Down Expand Up @@ -87,6 +88,7 @@ PRELUDE_INFO(base_GHCziWord_W64zh_con_info);
PRELUDE_INFO(base_GHCziStable_StablePtr_static_info);
PRELUDE_INFO(base_GHCziStable_StablePtr_con_info);

#define Unit_closure DLL_IMPORT_DATA_REF(ghczmprim_GHCziTuple_Z0T_closure)
#define True_closure DLL_IMPORT_DATA_REF(ghczmprim_GHCziTypes_True_closure)
#define False_closure DLL_IMPORT_DATA_REF(ghczmprim_GHCziTypes_False_closure)
#define unpackCString_closure DLL_IMPORT_DATA_REF(base_GHCziPack_unpackCString_closure)
Expand Down
8 changes: 8 additions & 0 deletions rts/PrimOps.cmm
Original file line number Diff line number Diff line change
Expand Up @@ -1739,6 +1739,13 @@ loop:
}


// NOTE: there is another implementation of this function in
// Threads.c:performTryPutMVar(). Keep them in sync! It was
// measurably slower to call the C function from here (70% for a
// tight loop doing tryPutMVar#).
//
// TODO: we could kill the duplication by making tryPutMVar# into an
// inline primop that expands into a C call to performTryPutMVar().
stg_tryPutMVarzh ( P_ mvar, /* :: MVar a */
P_ val, /* :: a */ )
{
Expand Down Expand Up @@ -1812,6 +1819,7 @@ loop:
return (1);
}


stg_readMVarzh ( P_ mvar, /* :: MVar a */ )
{
W_ val, info, tso, q;
Expand Down
75 changes: 75 additions & 0 deletions rts/RtsAPI.c
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
#include "Schedule.h"
#include "Capability.h"
#include "Stable.h"
#include "Threads.h"
#include "Weak.h"

/* ----------------------------------------------------------------------------
Expand Down Expand Up @@ -620,3 +621,77 @@ void rts_done (void)
freeMyTask();
}

/* -----------------------------------------------------------------------------
tryPutMVar from outside Haskell
The C call
hs_try_putmvar(cap, mvar)
is equivalent to the Haskell call
tryPutMVar mvar ()
but it is
* non-blocking: takes a bounded, short, amount of time
* asynchronous: the actual putMVar may be performed after the
call returns. That's why hs_try_putmvar() doesn't return a
result to say whether the put succeeded.
NOTE: this call transfers ownership of the StablePtr to the RTS, which will
free it after the tryPutMVar has taken place. The reason is that otherwise,
it would be very difficult for the caller to arrange to free the StablePtr
in all circumstances.
For more details, see the section "Waking up Haskell threads from C" in the
User's Guide.
-------------------------------------------------------------------------- */

void hs_try_putmvar (/* in */ int capability,
/* in */ HsStablePtr mvar)
{
Task *task = getTask();
Capability *cap;

if (capability < 0) {
capability = task->preferred_capability;
if (capability < 0) {
capability = 0;
}
}
cap = capabilities[capability % enabled_capabilities];

#if !defined(THREADED_RTS)

performTryPutMVar(cap, (StgMVar*)deRefStablePtr(mvar), Unit_closure);
freeStablePtr(mvar);

#else

ACQUIRE_LOCK(&cap->lock);
// If the capability is free, we can perform the tryPutMVar immediately
if (cap->running_task == NULL) {
cap->running_task = task;
task->cap = cap;
RELEASE_LOCK(&cap->lock);

performTryPutMVar(cap, (StgMVar*)deRefStablePtr(mvar), Unit_closure);

freeStablePtr(mvar);

// Wake up the capability, which will start running the thread that we
// just awoke (if there was one).
releaseCapability(cap);
} else {
PutMVar *p = stgMallocBytes(sizeof(PutMVar),"hs_try_putmvar");
// We cannot deref the StablePtr if we don't have a capability,
// so we have to store it and deref it later.
p->mvar = mvar;
p->link = cap->putMVars;
cap->putMVars = p;
RELEASE_LOCK(&cap->lock);
}

#endif
}
Loading

0 comments on commit 454033b

Please sign in to comment.