Skip to content
This repository has been archived by the owner on Oct 16, 2023. It is now read-only.

RFC: New module 'rtt' and a bunch of changes in aid of what it does #48

Merged
merged 5 commits into from
Apr 11, 2016

Conversation

zackw
Copy link
Contributor

@zackw zackw commented Mar 29, 2016

This PR adds a new module rtt.py, which measures round-trip times through exits, and makes a bunch of improvements to the core in aid of what it does. Please see individual commit messages for explanations.

I'm happy with the changes to the core. However, please note that rtt.py is not yet fully reliable; I could use some debugging help. It starts out working well, but if you run it over lots of exits and destinations, more and more connections fail as time goes by, with errors like

2016-03-27 10:23:42,862 [WARNING]: Failed to attach stream because: Unknown circuit "None"

and

  File ".../exitmap/src/modules/rtt.py", line 148, in perform_probes
    sock = socket.socket(AF_INET, SOCK_STREAM)
  File ".../exitmap/src/torsocks.py", line 343, in torsocket
    return _Torsocket(family, type, proto, _sock)
  File ".../exitmap/src/torsocks.py", line 114, in __init__
    self._authenticate()
  File ".../exitmap/src/torsocks.py", line 181, in _authenticate
    send_queue(self.getsockname())
  File ".../exitmap/src/torsocks.py", line 79, in send_queue
    queue.put([circ_id, sock_name])
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe

zackw added 5 commits March 29, 2016 17:54
eventhandler.EventHandler.check_finished is called by both the queue
thread and the main thread, but the tear-down operations that it performs
must only happen once.  The call to sys.exit() DOES NOT prevent the
tear-down from happening twice.  (Is the "queue_thread.daemon = False"
line in EventHandler.__init__ perhaps meant to be "= True"?  It wouldn't
fix the race here, but it might prevent other problems, e.g. the program
hanging forever if a module throws an uncaught exception.)  Anyway,
add an explicit lock.
This was a major revision to the torsocks module, but the _effect_
is simple to explain: a probe module can now do

    s = socket.socket(AF_UNIX, SOCK_STREAM)
    s.setblocking(False)
    err = s.connect_ex(addr)
    if err == errno.EINPROGRESS:
        select.select([s], [], [])

and the select() will return when the connection is complete.  (In a real
module that needs this, one would have _several_ pending connections and
process them as they become available.)  Note that there is a visible
difference from how a normal socket behaves, here: you select for the
socket to become _readable_ before continuing.  A normal socket becomes
_writable_ when the TCP handshake completes.

There's a bunch of secondary work in here in aid of making the emulation
of non-blocking half-open socket behavior as accurate as possible, and
an ancillary change (too difficult to disentangle into its own commit)
in which we make the monkey-patching of socket.socket more robust.  It
is no longer necessary to put back the original socket.socket around
event queue operations, and the monkey-patching will be 100% cleaned
up if a module throws an uncaught exception.
This is forward-compatibility for the next change, in which
a new kwarg (that none of them need, but a new module does)
will be added.
Formerly, only exits that could connect to _all_ of a module's requested
destinations were included in the set of exits to be probed.  Now,
instead, every exit that can connect to _at least one_ of a module's
requested destinations will be included, and probe() receives a new
keyword argument, destinations=, that tells it which destinations it
can use for each exit.  (The destinations list will be a proper subset
of module.destinations, with all hostnames resolved to IP addresses.)

This does not affect the behavior of any existing module, because all
of the existing modules use only a single destination.
The preceding changes are all in aid of this module.  Note that it
currently isn't 100% robust.  It works well for _one_ exit, but
if you try to run it over many destinations and all exits, fewer
and fewer connections will work correctly as time goes by.  I could
use some debugging help.

selectors34.py and six.py are third-party code licensed under MIT-like
licenses.  They are backports of Python 3.(>=4) standard library
functionality to Python 2, and redistributing them in this fashion
is encouraged by their authors.
@NullHypothesis
Copy link
Owner

These changes look great, thanks Zack! Would you mind running the pep8 tool over your files? It analyses the code and points out lines that violate Python's PEP 8 style guide, which I'm trying to honour in exitmap.

Regarding your issue with rtt.py: I wonder if your module triggers some heuristic in Tor that makes it close these circuits. Have you tried starting exitmap in debug mode and looking at all circuit events that happen for a circuit before it causes this error message?

@zackw
Copy link
Contributor Author

zackw commented Apr 7, 2016

Would you mind running the pep8 tool over your files? It analyses the code and points out lines that violate Python's PEP 8 style guide, which I'm trying to honour in exitmap.

This makes a large number of suggestions that IMHO harm readability. The biggest problem is with code of the form

    exits = get_exits(args.data_dir,
                      country_code = args.countrycode,
                      bad_exit     = args.badexit,
                      good_exit    = args.goodexit,
                      version      = args.version,
                      nickname     = args.nickname,
                      address      = args.address)

It is only happy if I write

    exits = get_exits(args.data_dir,
                      country_code=args.countrycode,
                      bad_exit=args.badexit,
                      good_exit=args.goodexit,
                      version=args.version,
                      nickname=args.nickname,
                      address=args.address)

which I think is significantly less readable. This occurs in a whole bunch of places, since there are many functions in this codebase that take lots of keyword arguments. It is also insisting on the insertion of blank lines within in a block of stub methods, e.g.

    # These sockets can only be used as connected sockets.
    def sendto(self, *a): raise NotImplementedError
    def recvfrom(self, *a): raise NotImplementedError
    def recvfrom_into(self, *a): raise NotImplementedError

which would destroy the logical grouping.

Also, pep8 conformance changes to selectors34.py and six.py would be inappropriate, since they are imported verbatim from elsewhere.

I have made all of the pep8 changes that I think improve, or at least do no harm, to readability, and I also have a fix for one of the CI failures (not sure what's up with the other two) but before I revise the pull request I would like to know if you insist on pep8 conformance even in the above cases.

@zackw
Copy link
Contributor Author

zackw commented Apr 7, 2016

Note: having now actually read PEP 8, it appears generally to endorse the principle that local readability is more important than complete conformance with the rules, and it specifically calls out "a bunch of related one-liners (e.g. a set of dummy implementations)" as not needing to have blank lines between them.

There's another interesting case I meant to mention: rtt.py has a set of configuration parameter variables as the first thing in the file after the module docstring, before the imports. PEP 8 says the imports should be immediately after the docstring, but I don't want to do that because the import block is long and complicated (due to various compatibility hacks); I think people might not realize there were "please edit these for your use case" variables if they were under the imports. Instead I would like to see about getting module-specific command line arguments implemented so that people do not need to edit the module, and then these can just go away.

@zackw
Copy link
Contributor Author

zackw commented Apr 7, 2016

Filed PyCQA/pycodestyle#495 on the alignment issue. Blank lines within a group of stubs is already filed as PyCQA/pycodestyle#457.

@NullHypothesis NullHypothesis merged commit d3d197a into NullHypothesis:master Apr 11, 2016
@NullHypothesis
Copy link
Owner

Thanks again.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants