Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label geographic coordinate to district coordinate #4

Open
juifa-tsai opened this issue Jul 24, 2017 · 4 comments
Open

Label geographic coordinate to district coordinate #4

juifa-tsai opened this issue Jul 24, 2017 · 4 comments

Comments

@juifa-tsai
Copy link
Collaborator

juifa-tsai commented Jul 24, 2017

According to pre-EDA, the pickup & dropoff locations can be the significant variables for analysis. Those variables are stored with geographic coordinates, i.e. longitude v.s. latitude. However, those values may be non-meaningful and bias in regression method,, since they are for labeling the location mathematically instead for quantity. Thus, finding the meaningful & efficient way to label the location is an issue. Here is an basic idea - Categorized to belonged districts instead of using continue value :

  1. Category types/ranges : Boroughs (5) > Community areas/boards, CB (18max/bor.) > Neighborhoods (?)
  2. Label with encode method to binary bits, e.g 010 110 for one of CB (total 59).

Discussion :

  1. Should we use multi-categories & single category?
  2. Use CB or Neighborhoods?
  3. How to extract neighborhoods numbers?
  4. How to map geographic coord. to particular district category?
@yennanliu
Copy link
Owner

yennanliu commented Jul 24, 2017

Re :
1-3. to - discussion list
4. you can google "polygon ", there are already tools doing this, may be interesting if we can extract features from lon & lat
maybe check :

for more information
cheers!

@juifa-tsai
Copy link
Collaborator Author

Or we can define own regions without meaning?
For instance, let's cut the map to many square boxes?

@yennanliu
Copy link
Owner

yennanliu commented Jul 24, 2017

yes ! we can, I did similar things before. we can separate the city into group like " most busy",
"busy", "medium", and "casual" ones.

@juifa-tsai
Copy link
Collaborator Author

juifa-tsai commented Jul 24, 2017

Cool, I think it quite important do this before diving to ML.
I'd recommend do both:

  1. defining box region as chessboard -> for ML
  2. label as density -> make chessboard has meaning

What do you think?

I can try to build a function for defining the chessboard & linear distance in this weekend, if you don't mind the time delay lol

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants