RFC: New module 'rtt' and a bunch of changes in aid of what it does #48

zackw · 2016-03-29T21:59:11Z

This PR adds a new module rtt.py, which measures round-trip times through exits, and makes a bunch of improvements to the core in aid of what it does. Please see individual commit messages for explanations.

I'm happy with the changes to the core. However, please note that rtt.py is not yet fully reliable; I could use some debugging help. It starts out working well, but if you run it over lots of exits and destinations, more and more connections fail as time goes by, with errors like

2016-03-27 10:23:42,862 [WARNING]: Failed to attach stream because: Unknown circuit "None"

and

  File ".../exitmap/src/modules/rtt.py", line 148, in perform_probes
    sock = socket.socket(AF_INET, SOCK_STREAM)
  File ".../exitmap/src/torsocks.py", line 343, in torsocket
    return _Torsocket(family, type, proto, _sock)
  File ".../exitmap/src/torsocks.py", line 114, in __init__
    self._authenticate()
  File ".../exitmap/src/torsocks.py", line 181, in _authenticate
    send_queue(self.getsockname())
  File ".../exitmap/src/torsocks.py", line 79, in send_queue
    queue.put([circ_id, sock_name])
  File "<string>", line 2, in put
  File "/usr/lib/python2.7/multiprocessing/managers.py", line 758, in _callmethod
    conn.send((self._id, methodname, args, kwds))
IOError: [Errno 32] Broken pipe

eventhandler.EventHandler.check_finished is called by both the queue thread and the main thread, but the tear-down operations that it performs must only happen once. The call to sys.exit() DOES NOT prevent the tear-down from happening twice. (Is the "queue_thread.daemon = False" line in EventHandler.__init__ perhaps meant to be "= True"? It wouldn't fix the race here, but it might prevent other problems, e.g. the program hanging forever if a module throws an uncaught exception.) Anyway, add an explicit lock.

This was a major revision to the torsocks module, but the _effect_ is simple to explain: a probe module can now do s = socket.socket(AF_UNIX, SOCK_STREAM) s.setblocking(False) err = s.connect_ex(addr) if err == errno.EINPROGRESS: select.select([s], [], []) and the select() will return when the connection is complete. (In a real module that needs this, one would have _several_ pending connections and process them as they become available.) Note that there is a visible difference from how a normal socket behaves, here: you select for the socket to become _readable_ before continuing. A normal socket becomes _writable_ when the TCP handshake completes. There's a bunch of secondary work in here in aid of making the emulation of non-blocking half-open socket behavior as accurate as possible, and an ancillary change (too difficult to disentangle into its own commit) in which we make the monkey-patching of socket.socket more robust. It is no longer necessary to put back the original socket.socket around event queue operations, and the monkey-patching will be 100% cleaned up if a module throws an uncaught exception.

This is forward-compatibility for the next change, in which a new kwarg (that none of them need, but a new module does) will be added.

Formerly, only exits that could connect to _all_ of a module's requested destinations were included in the set of exits to be probed. Now, instead, every exit that can connect to _at least one_ of a module's requested destinations will be included, and probe() receives a new keyword argument, destinations=, that tells it which destinations it can use for each exit. (The destinations list will be a proper subset of module.destinations, with all hostnames resolved to IP addresses.) This does not affect the behavior of any existing module, because all of the existing modules use only a single destination.

The preceding changes are all in aid of this module. Note that it currently isn't 100% robust. It works well for _one_ exit, but if you try to run it over many destinations and all exits, fewer and fewer connections will work correctly as time goes by. I could use some debugging help. selectors34.py and six.py are third-party code licensed under MIT-like licenses. They are backports of Python 3.(>=4) standard library functionality to Python 2, and redistributing them in this fashion is encouraged by their authors.

NullHypothesis · 2016-04-04T17:25:44Z

These changes look great, thanks Zack! Would you mind running the pep8 tool over your files? It analyses the code and points out lines that violate Python's PEP 8 style guide, which I'm trying to honour in exitmap.

Regarding your issue with rtt.py: I wonder if your module triggers some heuristic in Tor that makes it close these circuits. Have you tried starting exitmap in debug mode and looking at all circuit events that happen for a circuit before it causes this error message?

zackw · 2016-04-07T12:50:08Z

Would you mind running the pep8 tool over your files? It analyses the code and points out lines that violate Python's PEP 8 style guide, which I'm trying to honour in exitmap.

This makes a large number of suggestions that IMHO harm readability. The biggest problem is with code of the form

    exits = get_exits(args.data_dir,
                      country_code = args.countrycode,
                      bad_exit     = args.badexit,
                      good_exit    = args.goodexit,
                      version      = args.version,
                      nickname     = args.nickname,
                      address      = args.address)

It is only happy if I write

    exits = get_exits(args.data_dir,
                      country_code=args.countrycode,
                      bad_exit=args.badexit,
                      good_exit=args.goodexit,
                      version=args.version,
                      nickname=args.nickname,
                      address=args.address)

which I think is significantly less readable. This occurs in a whole bunch of places, since there are many functions in this codebase that take lots of keyword arguments. It is also insisting on the insertion of blank lines within in a block of stub methods, e.g.

    # These sockets can only be used as connected sockets.
    def sendto(self, *a): raise NotImplementedError
    def recvfrom(self, *a): raise NotImplementedError
    def recvfrom_into(self, *a): raise NotImplementedError

which would destroy the logical grouping.

Also, pep8 conformance changes to selectors34.py and six.py would be inappropriate, since they are imported verbatim from elsewhere.

I have made all of the pep8 changes that I think improve, or at least do no harm, to readability, and I also have a fix for one of the CI failures (not sure what's up with the other two) but before I revise the pull request I would like to know if you insist on pep8 conformance even in the above cases.

zackw · 2016-04-07T13:05:21Z

Note: having now actually read PEP 8, it appears generally to endorse the principle that local readability is more important than complete conformance with the rules, and it specifically calls out "a bunch of related one-liners (e.g. a set of dummy implementations)" as not needing to have blank lines between them.

There's another interesting case I meant to mention: rtt.py has a set of configuration parameter variables as the first thing in the file after the module docstring, before the imports. PEP 8 says the imports should be immediately after the docstring, but I don't want to do that because the import block is long and complicated (due to various compatibility hacks); I think people might not realize there were "please edit these for your use case" variables if they were under the imports. Instead I would like to see about getting module-specific command line arguments implemented so that people do not need to edit the module, and then these can just go away.

zackw · 2016-04-07T13:23:09Z

Filed PyCQA/pycodestyle#495 on the alignment issue. Blank lines within a group of stubs is already filed as PyCQA/pycodestyle#457.

NullHypothesis · 2016-04-11T03:26:58Z

Thanks again.

zackw added 5 commits March 29, 2016 17:54

Make all probe() functions accept and ignore arbitrary kwargs.

c41a346

This is forward-compatibility for the next change, in which a new kwarg (that none of them need, but a new module does) will be added.

zackw force-pushed the master branch from d14adc6 to bf40a5f Compare March 29, 2016 22:02

NullHypothesis merged commit d3d197a into NullHypothesis:master Apr 11, 2016

NullHypothesis mentioned this pull request Oct 14, 2016

Fixes pep8 violations and cleans up code (#22)(#53) #54

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: New module 'rtt' and a bunch of changes in aid of what it does #48

RFC: New module 'rtt' and a bunch of changes in aid of what it does #48

zackw commented Mar 29, 2016

NullHypothesis commented Apr 4, 2016

zackw commented Apr 7, 2016

zackw commented Apr 7, 2016

zackw commented Apr 7, 2016

NullHypothesis commented Apr 11, 2016

RFC: New module 'rtt' and a bunch of changes in aid of what it does #48

RFC: New module 'rtt' and a bunch of changes in aid of what it does #48

Conversation

zackw commented Mar 29, 2016

NullHypothesis commented Apr 4, 2016

zackw commented Apr 7, 2016

zackw commented Apr 7, 2016

zackw commented Apr 7, 2016

NullHypothesis commented Apr 11, 2016