Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elara is altering link ids #240

Open
D-Dulius opened this issue Feb 7, 2025 · 11 comments
Open

Elara is altering link ids #240

D-Dulius opened this issue Feb 7, 2025 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@D-Dulius
Copy link

D-Dulius commented Feb 7, 2025

We have discovered an elara bug while working on the Basildon ABM build. For various validation exercises we use some of the elara outputs, in this specific case link_vehicle_speeds_car_average.geojson.

Comparing this network with the genet network used by the same sim we can see for links with the same node numbers, the link ids change.

For example, sim 2019_baseline_new_network_controller_10pc_20250108 according to the matsim config uses network v5_mad_prairie via:

<param name="inputNetworkFile" value="/efs/basildon/network/v5_mad_prairie/plus_wider_te/network.xml"/>

For link id 402018 in link_vehicle_speeds_car_average.geojson, this link has the following from, to nodes: 5177106139033910521 and 5177106139066155527. Whereas in v5_mad_prairie the link id is 68892 (same from, to node ids).

Upon finding this, I also remembered seeing this exact same issue flagged as a comment in Alex K's old BERTIE validation jupyter notebooks (some of which we still use and converted into scripts as part of our current BERTIE validation workflow), see: https://github.com/arup-group/te_post_processing/blob/6f141b0cdb94025df21c6174cf468b23ac8ff81f/benchmarking_2023_refresh/Dashboard/SERTM%20Benchmark%20-%20By%20Vehicle%20Type_v2.py#L189C5-L196C63

At the time I inherited the above I had no real idea about elara or what it did and because the notebooks were so old I assumed whatever this bug was, was fixed by the time I took over looking after validation for our V2 refresh of BERTIE in 2023. The current workaround used in the benchmarking scripts is to use the from, to node ids and join to a version of the network pre-elara or a version we know is correct. This is what I have suggested we do currently for Basildon while we investigate this issue (elara seems to be only altering link ids, node ids are unaffected, confirmed by the above example).

Image

Image

@D-Dulius D-Dulius added the bug Something isn't working label Feb 7, 2025
@divyasharma-arup
Copy link
Contributor

Very quick question - we had talked about indexing the network data because the string information for link_ids makes the datasets very large. Are the link_ids in link_vehicle_speeds_car_average.geojson all integers, or do you see formats such as: 5177106139033910521_5177106139066155527? I'm just wondering if the elara file has been re-indexed for any reason.

@divyasharma-arup
Copy link
Contributor

Reading through some of what is posted on slack, it seems like the feature of re-indexing elara outputs to reduce size (and improve run times) means the link_ids are no longer compatible with the original network file. I don't know if we have an "elara" version of the network file @syhwawa, that has the re-indexed link_ids?

@D-Dulius
Copy link
Author

D-Dulius commented Feb 10, 2025

Very quick question - we had talked about indexing the network data because the string information for link_ids makes the datasets very large. Are the link_ids in link_vehicle_speeds_car_average.geojson all integers, or do you see formats such as: 5177106139033910521_5177106139066155527? I'm just wondering if the elara file has been re-indexed for any reason.

@divyasharma-arup I can see some link ids with the long node to node format i.e 5177106139033910521_5177106139066155527 in link_vehicle_speeds_car_average.geojson, I didn't realise elara re-indexing link ids was actually a feature not a bug! This has caused problems though because I didn't know, i.e I gave the elara output above to some of the team/Dan for their network review and so when those links were handed over to Neil the link ids didn't match the genet network.

Though if elara was re-indexing properly then that should imply there should be no link ids which are non-intergers in the elara outputs, is that right?

@D-Dulius
Copy link
Author

I understand the reasons for reindexing link ids like you say above (run times) but would it not be better to just apply the reindexing at the genet/network creation stage? So we have a consistent network both pre/post elara?

@divyasharma-arup
Copy link
Contributor

Though if elara was re-indexing properly then that should imply there should be no link ids which are non-intergers in the elara outputs, is that right?

Assuming this is the reason why the link_ids are different between the two files, yes, I'd expect link_vehicle_speeds_car_average.geojson to only have integer link_ids...and I agree, the re-indexing should be consistent.

Just a note that elara's scope is MATSim outputs (I think). So things like synthesis inputs (population & network) will have their original information -- which we want to maintain for traceability of the process. Therefore, we should probably only be comparing elara outputs with each other. Have you ever worked with output_network.xml? I am hoping that the link_ids will be consistent here, as we definitely assume they are in a lot of our code...!

@D-Dulius
Copy link
Author

D-Dulius commented Feb 10, 2025

I haven't worked with output_network.xml no, link_vehicle_speeds_car_average.geojson we use specifically for benchmarking routed link-based journey times now, and because I thought the network pre/post elara was consistent I gave the elara output for others in the Basildon team for network reviews, no way they would be able to know how to use an xml for this purpose (i.e the strategic modellers who have been resourced on the project), so something to think about for us potentially.

(elara outputs are easier to use for non-CMLers with no programming background as they can just stick it in QGIS)

@D-Dulius
Copy link
Author

I myself don't really have any experience using/parsing xml files hehe, can geopandas read in xml files?

@divyasharma-arup
Copy link
Contributor

ersa object is useful for that: sample script.

Your point is noted that the output_network.xml is not easy to work with (which is why it's parsed in the above). Something for us to think about in regards to whether ersa should be the home for this code of parsing the network or whether it should be elara.

@divyasharma-arup
Copy link
Contributor

you can do this with ersa:

def create_scenario(path_scenario):
    s = Scenario(data={
        'network': Network(path=os.path.join(path_scenario, 'output_network.xml'), crs='27700'),
        'link_logs': Table(
            path=os.path.join(path_scenario, 'vehicle_link_log_all.csv'))
    })
    return s


baseline = create_scenario(path_baseline)
baseline.network.link_lengths

@D-Dulius
Copy link
Author

you can do this with ersa:

def create_scenario(path_scenario):
    s = Scenario(data={
        'network': Network(path=os.path.join(path_scenario, 'output_network.xml'), crs='27700'),
        'link_logs': Table(
            path=os.path.join(path_scenario, 'vehicle_link_log_all.csv'))
    })
    return s


baseline = create_scenario(path_baseline)
baseline.network.link_lengths

Ah yes of course, forgot ersa loads in the output xml files, in that case I have indirectly worked with xml files via ersa lol, I will take a look at the xml for link ids which were not able to be matched last week

@syhwawa
Copy link
Contributor

syhwawa commented Feb 10, 2025

Reading through some of what is posted on slack, it seems like the feature of re-indexing elara outputs to reduce size (and improve run times) means the link_ids are no longer compatible with the original network file.

Could you remind me where was conversation/thread about this please? @divyasharma-arup.

TBH, I thought the link_id should be the same compared to the network file but it seems it doesn't.

Use ersa to load the matsim output network might be the easiest way and it's worth examining the elara code to check where the reindexing happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants