Statistical bug in primary locations attribution #237

jcombaz · 2024-05-27T12:48:37Z

In the file synthesis/population/spatial/primary, the following code introduces significant spatial correlations between home and primary locations, since duplicated values are consecutive in the data generated by numpy.repeat (which is used to sample locations among candidates). In particular, the probability of two commutes having the exact same origins and destinations is significantly larger than what it should be.

   location_ids = np.repeat(location_ids, location_counts)

    # Construct a data set for all commutes to this zone
    origin_id = np.repeat(df_flow["origin_id"].values, df_flow["count"].values)

    df_result = pd.DataFrame.from_records(dict(
        origin_id = origin_id,
        location_id = location_ids
    ))

To fix this the attribution of origins to destinations should be made independent of the order in the data sets, e.g.:

   location_ids = np.repeat(location_ids, location_counts)

    # Construct a data set for all commutes to this zone
    origin_id = np.repeat(df_flow["origin_id"].values, df_flow["count"].values)

    np.random.shuffle(origin_id)
    np.random.shuffle(location_ids)

    df_result = pd.DataFrame.from_records(dict(
        origin_id = origin_id,
        location_id = location_ids
    ))

The text was updated successfully, but these errors were encountered:

sebhoerl · 2024-05-27T16:26:21Z

Hi Jacques, thanks for bringing this up. I just thought it through, and I think I agree. Though shuffling one is probably sufficient. I have the feeling that maybe the multinomial sampler was implemented differently before at some point and then this error got introduced.

Just out of interest, do you have any statistical analysis that shows the before and after?

jcombaz · 2024-05-28T08:02:55Z

Hi! Yes, I think one shuffing is enough but I haven't tested it. I discovered this "statistical bug" when developing alternative approaches to Aina's analysis of carpooling. Here is an example of an analysis of commute trips based on the synthetic population, in which two trips can be carpooled iff their origins, destinations and departure times are close enough. The following graph shows the total number of "carpoolable" kilometers depending on the spatial constraint on origins and destinations (in meters). The green curve corresponds to the case where origins and destinations are not shuffled, and the purple curve when they are shuffled. The relative difference between these two curves can be significant for smaller values of the spatial parameter. Regards, Jacques.

…

-- Verimag - Bâtiment IMAG - Université Grenoble Alpes 150 place du torrent 38401 Saint Martin d’Hères ***@***.*** www-verimag.imag.fr/~jcombaz phone: +33 (0)4 57 42 22 10 fax: +33 (0)4 57 42 22 22 De: "Sebastian Hörl" ***@***.***> À: "eqasim-org/ile-de-france" ***@***.***> Cc: "Jacques Combaz" ***@***.***>, "Author" ***@***.***> Envoyé: Lundi 27 Mai 2024 18:26:43 Objet: Re: [eqasim-org/ile-de-france] Statistical bug in primary locations attribution (Issue #237) Hi Jacques, thanks for bringing this up. I just thought it through, and I think I agree. Though shuffling one is probably sufficient. I have the feeling that maybe the multinomial sampler was implemented differently before at some point and then this error got introduced. Just out of interest, do you have any statistical analysis that shows the before and after? — Reply to this email directly, [ #237 (comment) | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AGNBKMA67ALEBJUJEQWX4JDZENNEHAVCNFSM6AAAAABILEPA62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZTHAYDANZWGE | unsubscribe ] . You are receiving this because you authored the thread. Message ID: ***@***.***>

sebhoerl · 2024-05-28T14:14:33Z

Ok, makes sense. Do you want to create a PR with the fix? Otherwise I can look into it end of the week.

jcombaz · 2024-05-28T14:36:25Z

Yes, I can do that. Should I do my PR directly on the develop branch? Jacques.

…

-- Verimag - Bâtiment IMAG - Université Grenoble Alpes 150 place du torrent 38401 Saint Martin d’Hères ***@***.*** www-verimag.imag.fr/~jcombaz phone: +33 (0)4 57 42 22 10 fax: +33 (0)4 57 42 22 22 De: "Sebastian Hörl" ***@***.***> À: "eqasim-org" ***@***.***> Cc: "Jacques Combaz" ***@***.***>, "Author" ***@***.***> Envoyé: Mardi 28 Mai 2024 16:14:56 Objet: Re: [eqasim-org/ile-de-france] Statistical bug in primary locations attribution (Issue #237) Ok, makes sense. Do you want to create a PR with the fix? Otherwise I can look into it end of the week. — Reply to this email directly, [ #237 (comment) | view it on GitHub ] , or [ https://github.com/notifications/unsubscribe-auth/AGNBKMEDCHKP7QQWJOVZKHLZESGOBAVCNFSM6AAAAABILEPA62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGMZDMNBQGU | unsubscribe ] . You are receiving this because you authored the thread. Message ID: ***@***.***>

sebhoerl · 2024-05-29T06:07:59Z

Yes, thanks!

…

________________________________ From: jcombaz ***@***.***> Sent: 28 May 2024 16:36 To: eqasim-org/ile-de-france ***@***.***> Cc: Sebastian HORL ***@***.***>; Comment ***@***.***> Subject: Re: [eqasim-org/ile-de-france] Statistical bug in primary locations attribution (Issue #237) Yes, I can do that. Should I do my PR directly on the develop branch? Jacques.

-- Verimag - Bâtiment IMAG - Université Grenoble Alpes 150 place du torrent 38401 Saint Martin d’Hères ***@***.*** www-verimag.imag.fr/~jcombaz phone: +33 (0)4 57 42 22 10 fax: +33 (0)4 57 42 22 22 De: "Sebastian Hörl" ***@***.***> À: "eqasim-org" ***@***.***> Cc: "Jacques Combaz" ***@***.***>, "Author" ***@***.***> Envoyé: Mardi 28 Mai 2024 16:14:56 Objet: Re: [eqasim-org/ile-de-france] Statistical bug in primary locations attribution (Issue #237) Ok, makes sense. Do you want to create a PR with the fix? Otherwise I can look into it end of the week. — Reply to this email directly, [ https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2feqasim%2dorg%2file%2dde%2dfrance%2fissues%2f237%23issuecomment%2d2135326405&umid=0f209477-7367-47e0-9804-7490e3a439b2&auth=b6005005a7b50bc3a68b2003f1e38d069f93f262-7ecdbc96cc0071acbaa3bec49c805354fc04615a | view it on GitHub ] , or [ https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2fnotifications%2funsubscribe%2dauth%2fAGNBKMEDCHKP7QQWJOVZKHLZESGOBAVCNFSM6AAAAABILEPA62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGMZDMNBQGU&umid=0f209477-7367-47e0-9804-7490e3a439b2&auth=b6005005a7b50bc3a68b2003f1e38d069f93f262-8bf2fcdb6870404c38e083e9274e12861dbf8609 | unsubscribe ] . You are receiving this because you authored the thread. Message ID: ***@***.***> — Reply to this email directly, view it on GitHub<https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2feqasim%2dorg%2file%2dde%2dfrance%2fissues%2f237%23issuecomment%2d2135396002&umid=0f209477-7367-47e0-9804-7490e3a439b2&auth=b6005005a7b50bc3a68b2003f1e38d069f93f262-ee06a7da9fdb87251a482b2a59151f986101f9fb>, or unsubscribe<https://cas5-0-urlprotect.trendmicro.com:443/wis/clicktime/v1/query?url=https%3a%2f%2fgithub.com%2fnotifications%2funsubscribe%2dauth%2fAAE6CTHTNNQSIMNRRYDPVGTZESJABAVCNFSM6AAAAABILEPA62VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZVGM4TMMBQGI&umid=0f209477-7367-47e0-9804-7490e3a439b2&auth=b6005005a7b50bc3a68b2003f1e38d069f93f262-7dc3e4811ca49903bbe631281f7494076a66ce75>. You are receiving this because you commented.Message ID: ***@***.***>

sebhoerl mentioned this issue Jun 12, 2024

fix: shuffling of primary location candidates #242

Merged

sebhoerl closed this as completed in #242 Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Statistical bug in primary locations attribution #237

Statistical bug in primary locations attribution #237

jcombaz commented May 27, 2024

sebhoerl commented May 27, 2024 •

edited

Loading

jcombaz commented May 28, 2024 via email

sebhoerl commented May 28, 2024

jcombaz commented May 28, 2024 via email

sebhoerl commented May 29, 2024 via email

Statistical bug in primary locations attribution #237

Statistical bug in primary locations attribution #237

Comments

jcombaz commented May 27, 2024

sebhoerl commented May 27, 2024 • edited Loading

jcombaz commented May 28, 2024 via email

sebhoerl commented May 28, 2024

jcombaz commented May 28, 2024 via email

sebhoerl commented May 29, 2024 via email

sebhoerl commented May 27, 2024 •

edited

Loading