Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

application called MPI_Abort(MPI_COMM_WORLD, 821) - process 570 #834

Open
J-shel opened this issue Oct 24, 2022 · 4 comments
Open

application called MPI_Abort(MPI_COMM_WORLD, 821) - process 570 #834

J-shel opened this issue Oct 24, 2022 · 4 comments
Labels
bug Something isn't working

Comments

@J-shel
Copy link

J-shel commented Oct 24, 2022

Hello,
My wind input dimensions are (81, 1440, 721), which is time, longitude and latitude.
ww3_shel run successfully using 256 and 512 MPI processes, while it called MPI_Abort when using 1024 MPI processes.

Error message in the log file

shel start
Abort(821) on node 570 (rank 570 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 821) - process 570
shel end

Error message in ww3_shel.out

...
Type 5 : Nesting data
-----------------------------------------
From : 2022/08/04 12:00:00 UTC
To : 2022/08/14 12:00:00 UTC
Interval : 01:00:00

   Wave model ...

w3servmd MPI_ABORT, IEXIT= 821
w3servmd UNIT missing
w3servmd MSG missing
w3servmd FILE missing
w3servmd LINE missing
w3servmd COMM missing

I wonder if WW3 has a upper limit of the processes used and how to decide it?
Thank you!

@J-shel J-shel added the bug Something isn't working label Oct 24, 2022
@aliabdolali
Copy link
Contributor

The upper limit with the carddeck decomposition is NSPEC= number of frequency bins * number of directions

@J-shel
Copy link
Author

J-shel commented Oct 25, 2022

Hi, I didn't find any parameters named like "number of frequency bins" and "number of directions". Is it related to the workload, i.e. ,the shape of the input data? Could you explain it a little bit?

@aliabdolali
Copy link
Contributor

https://github.com/NOAA-EMC/WW3/blob/develop/model/inp/ww3_grid.inp#L15
25 and 24 in this example are the number of freq and direction, therefore user cannot use more than 25*24 cores. Check your setting and you can calculate the upper limit.

@J-shel
Copy link
Author

J-shel commented Oct 27, 2022

Cool! Thank you very much! O(∩_∩)O

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants