Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting behavior for aggregations that can't be calculated #96

Open
smmaurer opened this issue Jul 16, 2018 · 1 comment
Open

Documenting behavior for aggregations that can't be calculated #96

smmaurer opened this issue Jul 16, 2018 · 1 comment

Comments

@smmaurer
Copy link
Member

It looks like pandana.Network.aggregate() returns values of -1 for source nodes where an aggregation can't be calculated, for example if there aren't any other nodes within the distance radius. I can't find a reference to this in the documentation, though. We should confirm what the behavior is and make it more explicit.

Docstrings for pandana.Network.aggregate(): https://github.com/UDST/pandana/blob/master/pandana/network.py#L274-L320

Sphinx documentation: http://udst.github.io/pandana/network.html#pandana.network.Network.aggregate

There are several code conditions in the C++ that produce values of -1, but I haven't traced out the details: https://github.com/UDST/pandana/blob/master/src/accessibility.cpp

@smmaurer
Copy link
Member Author

Related to this are the messages about dropped rows that sometimes show up when you run an aggregation calculation:

Computing pop_500_walk
Removed 189769 rows because they contain missing values

These messages are generated by the pandana.Network.set() call that links the values being aggregated to the network.

https://github.com/UDST/pandana/blob/master/pandana/network.py#L235

Here's what happens, for the example of aggregating a variable from the households table:

  • if filters are provided, these rows are removed from the households table first
  • then, pandana tries to link each remaining row to a network node
  • any rows that are either missing a node id or have a NaN in the column being aggregated are dropped
  • the message refers to rows dropped from the households table, not nodes dropped from the network

Often, the rows are dropped because they can't be matched to nodes (for example households that are not assigned to buildings and thus don't have a spatial location), not because of missing values in the data column.

Rows that are explicitly filtered out aren't counted, which can result in variations in the number of rows dropped for aggregations in the same table.

Here is a notebook where we dug into this: More-aggregation-troubleshooting.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant