Elara is altering link ids #240

D-Dulius · 2025-02-07T11:53:18Z

We have discovered an elara bug while working on the Basildon ABM build. For various validation exercises we use some of the elara outputs, in this specific case link_vehicle_speeds_car_average.geojson.

Comparing this network with the genet network used by the same sim we can see for links with the same node numbers, the link ids change.

For example, sim 2019_baseline_new_network_controller_10pc_20250108 according to the matsim config uses network v5_mad_prairie via:

<param name="inputNetworkFile" value="/efs/basildon/network/v5_mad_prairie/plus_wider_te/network.xml"/>

For link id 402018 in link_vehicle_speeds_car_average.geojson, this link has the following from, to nodes: 5177106139033910521 and 5177106139066155527. Whereas in v5_mad_prairie the link id is 68892 (same from, to node ids).

Upon finding this, I also remembered seeing this exact same issue flagged as a comment in Alex K's old BERTIE validation jupyter notebooks (some of which we still use and converted into scripts as part of our current BERTIE validation workflow), see: https://github.com/arup-group/te_post_processing/blob/6f141b0cdb94025df21c6174cf468b23ac8ff81f/benchmarking_2023_refresh/Dashboard/SERTM%20Benchmark%20-%20By%20Vehicle%20Type_v2.py#L189C5-L196C63

At the time I inherited the above I had no real idea about elara or what it did and because the notebooks were so old I assumed whatever this bug was, was fixed by the time I took over looking after validation for our V2 refresh of BERTIE in 2023. The current workaround used in the benchmarking scripts is to use the from, to node ids and join to a version of the network pre-elara or a version we know is correct. This is what I have suggested we do currently for Basildon while we investigate this issue (elara seems to be only altering link ids, node ids are unaffected, confirmed by the above example).

The text was updated successfully, but these errors were encountered:

divyasharma-arup · 2025-02-10T09:30:01Z

Very quick question - we had talked about indexing the network data because the string information for link_ids makes the datasets very large. Are the link_ids in link_vehicle_speeds_car_average.geojson all integers, or do you see formats such as: 5177106139033910521_5177106139066155527? I'm just wondering if the elara file has been re-indexed for any reason.

divyasharma-arup · 2025-02-10T09:34:35Z

Reading through some of what is posted on slack, it seems like the feature of re-indexing elara outputs to reduce size (and improve run times) means the link_ids are no longer compatible with the original network file. I don't know if we have an "elara" version of the network file @syhwawa, that has the re-indexed link_ids?

D-Dulius · 2025-02-10T09:43:43Z

Very quick question - we had talked about indexing the network data because the string information for link_ids makes the datasets very large. Are the link_ids in link_vehicle_speeds_car_average.geojson all integers, or do you see formats such as: 5177106139033910521_5177106139066155527? I'm just wondering if the elara file has been re-indexed for any reason.

@divyasharma-arup I can see some link ids with the long node to node format i.e 5177106139033910521_5177106139066155527 in link_vehicle_speeds_car_average.geojson, I didn't realise elara re-indexing link ids was actually a feature not a bug! This has caused problems though because I didn't know, i.e I gave the elara output above to some of the team/Dan for their network review and so when those links were handed over to Neil the link ids didn't match the genet network.

Though if elara was re-indexing properly then that should imply there should be no link ids which are non-intergers in the elara outputs, is that right?

D-Dulius · 2025-02-10T09:48:44Z

I understand the reasons for reindexing link ids like you say above (run times) but would it not be better to just apply the reindexing at the genet/network creation stage? So we have a consistent network both pre/post elara?

divyasharma-arup · 2025-02-10T10:06:59Z

Though if elara was re-indexing properly then that should imply there should be no link ids which are non-intergers in the elara outputs, is that right?

Assuming this is the reason why the link_ids are different between the two files, yes, I'd expect link_vehicle_speeds_car_average.geojson to only have integer link_ids...and I agree, the re-indexing should be consistent.

Just a note that elara's scope is MATSim outputs (I think). So things like synthesis inputs (population & network) will have their original information -- which we want to maintain for traceability of the process. Therefore, we should probably only be comparing elara outputs with each other. Have you ever worked with output_network.xml? I am hoping that the link_ids will be consistent here, as we definitely assume they are in a lot of our code...!

D-Dulius · 2025-02-10T10:20:16Z

I haven't worked with output_network.xml no, link_vehicle_speeds_car_average.geojson we use specifically for benchmarking routed link-based journey times now, and because I thought the network pre/post elara was consistent I gave the elara output for others in the Basildon team for network reviews, no way they would be able to know how to use an xml for this purpose (i.e the strategic modellers who have been resourced on the project), so something to think about for us potentially.

(elara outputs are easier to use for non-CMLers with no programming background as they can just stick it in QGIS)

D-Dulius · 2025-02-10T10:25:15Z

I myself don't really have any experience using/parsing xml files hehe, can geopandas read in xml files?

divyasharma-arup · 2025-02-10T10:30:02Z

ersa object is useful for that: sample script.

Your point is noted that the output_network.xml is not easy to work with (which is why it's parsed in the above). Something for us to think about in regards to whether ersa should be the home for this code of parsing the network or whether it should be elara.

divyasharma-arup · 2025-02-10T10:31:36Z

you can do this with ersa:

def create_scenario(path_scenario):
    s = Scenario(data={
        'network': Network(path=os.path.join(path_scenario, 'output_network.xml'), crs='27700'),
        'link_logs': Table(
            path=os.path.join(path_scenario, 'vehicle_link_log_all.csv'))
    })
    return s


baseline = create_scenario(path_baseline)
baseline.network.link_lengths

D-Dulius · 2025-02-10T10:34:41Z

you can do this with ersa:

def create_scenario(path_scenario):
    s = Scenario(data={
        'network': Network(path=os.path.join(path_scenario, 'output_network.xml'), crs='27700'),
        'link_logs': Table(
            path=os.path.join(path_scenario, 'vehicle_link_log_all.csv'))
    })
    return s


baseline = create_scenario(path_baseline)
baseline.network.link_lengths

Ah yes of course, forgot ersa loads in the output xml files, in that case I have indirectly worked with xml files via ersa lol, I will take a look at the xml for link ids which were not able to be matched last week

syhwawa · 2025-02-10T15:46:15Z

Reading through some of what is posted on slack, it seems like the feature of re-indexing elara outputs to reduce size (and improve run times) means the link_ids are no longer compatible with the original network file.

Could you remind me where was conversation/thread about this please? @divyasharma-arup.

TBH, I thought the link_id should be the same compared to the network file but it seems it doesn't.

Use ersa to load the matsim output network might be the easiest way and it's worth examining the elara code to check where the reindexing happens.

D-Dulius added the bug Something isn't working label Feb 7, 2025

D-Dulius assigned D-Dulius, divyasharma-arup, syhwawa, neilmt and gac55 Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elara is altering link ids #240

Elara is altering link ids #240

D-Dulius commented Feb 7, 2025 •

edited

Loading

divyasharma-arup commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025 •

edited

Loading

D-Dulius commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025 •

edited

Loading

D-Dulius commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025

syhwawa commented Feb 10, 2025

Elara is altering link ids #240

Elara is altering link ids #240

Comments

D-Dulius commented Feb 7, 2025 • edited Loading

divyasharma-arup commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025 • edited Loading

D-Dulius commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025 • edited Loading

D-Dulius commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

divyasharma-arup commented Feb 10, 2025

D-Dulius commented Feb 10, 2025

syhwawa commented Feb 10, 2025

D-Dulius commented Feb 7, 2025 •

edited

Loading

D-Dulius commented Feb 10, 2025 •

edited

Loading

D-Dulius commented Feb 10, 2025 •

edited

Loading