-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deadlock and first nf-test #156
Conversation
…hecking if model learns, made dnafloat into default test
…ult be inside each process subdir instead of home dir
…stimulus into nf-test_pipeline
… still missing gpu info
… what nextflow allocates for the single process
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, still not finished
# shuffle the data | ||
csv_obj.shuffle_labels() | ||
# shuffle the data with a default seed. TODO get the seed for the config if and when that is going to be set there. | ||
csv_obj.shuffle_labels(seed=42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add it directly to?
def main(data_csv, config_json, out_path)
```
with a default value
Co-authored-by: Jose Espinosa-Carrasco <[email protected]>
Co-authored-by: Jose Espinosa-Carrasco <[email protected]>
Co-authored-by: Jose Espinosa-Carrasco <[email protected]>
Co-authored-by: Jose Espinosa-Carrasco <[email protected]>
Co-authored-by: Jose Espinosa-Carrasco <[email protected]>
Co-authored-by: Jose Espinosa-Carrasco <[email protected]>
Co-authored-by: Jose Espinosa-Carrasco <[email protected]>
bin/launch_utils.py
Outdated
def memory_split_for_ray_init(memory_str: Union[str, None]) -> Tuple[float, float]: | ||
""" | ||
compute the memory requirements for ray init. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Process the input memory value into the right unit and allocates 30% for overhead and 70% for tuning.
Co-authored-by: mathysgrapotte <[email protected]>
The changes revolve around creating the first nf-test for the workflow of data handling.
All improvements to try not to get deadlock hapenning. WHen that happens ray tune gets stuck in either PENDING or RUNNING mode without any error. To prevent that a number of changes were necessary:
if ray is initialized through tuner.fit() (that calls ray.init()) on the cluster it reads completely wrong the available resources. it sees way more than what is allocated for it. That's way ray is initialized esplicitly with a given set of values.