-
Notifications
You must be signed in to change notification settings - Fork 14
/
ChangeLog
1635 lines (1488 loc) · 74.6 KB
/
ChangeLog
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# StarPU --- Runtime system for heterogeneous multicore architectures.
#
# Copyright (C) 2009-2024 University of Bordeaux, CNRS (LaBRI UMR 5800), Inria
#
# StarPU is free software; you can redistribute it and/or modify
# it under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or (at
# your option) any later version.
#
# StarPU is distributed in the hope that it will be useful, but
# WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
#
# See the GNU Lesser General Public License in COPYING.LGPL for more details.
#
StarPU 1.5.0
==============================================
Changes:
* Rename hierarchical tasks in recursive tasks
* Fix asynchronous partitioning with data without home node
* Allow large sizes for vector, matrix, block, tensor and ndim data
interfaces, and use proper MPI datatypes to exchange them.
* Add soon_callback in tasks.
Small changes:
* Fix build system for StarPU Python interface
* When defined the variable STARPU_PERF_MODEL_DIR will be used to
dump perfmodel files.
* Check CUDA and HIP pointers on on-GPU data registration.
New features:
* Add starpu_data_register_victim_selector to let schedulers select eviction
victims.
* Add bus performance model for HIP driver.
* New scheduler darts (Data-Aware Reactive Task Scheduling)
* Add basic support for nOS-V hypervision
* Enable STARPU_MPI_THREAD_MULTIPLE_SEND by default on mpich, openmpi ≥ 4
and Mad-MPI.
Small features:
* Add FXT option -use-task-color to propagate the specified task
color to the contexts
* Add flag STARPU_SCHED_SIMPLE_FIFOS_BELOW_READY_FIRST and
environment variable STARPU_SCHED_FIFO_READY_FIRST to let FIFO
components pick up ready tasks first.
* Allow scheduling policies to be loaded with STARPU_SCHED&co but
not to be in the list of predefined policies
StarPU 1.4.8
==============================================
Small features:
* Add png curve generation to starpu_perfmodel_plot
* Add STARPU_MPI_THREAD_MULTIPLE_SEND environment variable to enable parallel
sending with MPI.
* Add starpu_tag_clear
* Add starpu_cublasLt_init/shutdown/get_local_handle helpers.
StarPU 1.4.7
==============================================
Small changes:
* Fix simgrid version of examples/mult
StarPU 1.4.6
==============================================
Small features:
* Add FXT option -use-task-color to propagate the specified task
color to the contexts
StarPU 1.4.5
==============================================
* Do not link libstarpu against libnvidia-ml
StarPU 1.4.4
==============================================
Small changes:
* Fix build system for StarPU Python interface
StarPU 1.4.3
==============================================
Small features:
* Add starpu_data_partition_readonly_downgrade_submit().
Small changes:
* StarPUPY no longer requires python modules joblib and cloudpickle
to be mandatory
StarPU 1.4.2
==============================================
Small features:
* New functions starpu_mpi_data_cpy() and starpu_mpi_recv_prio()
* New functions starpu_bind_thread_on_worker(),
starpu_bind_thread_on_main(), starpu_bind_thread_on_cpu(),
and starpu_cpu_os_index()
* New macro STARPU_CUSOLVER_REPORT_ERROR
StarPU 1.4.1
==============================================
Small features:
* Add starpu_mpi_tags_{allocate,free} functions to manage mpi data
tags in distributed memory.
Changes:
* Fix StarPUPY when not using asyncio: we can use concurrent.futures
instead.
* Add STARPU_CODELET_PROFILING environment variable to disable codelet task
counting for applications to be able to have const codelets.
* In performance bounds, take into account the standard deviation to get the
"expected" upper bound, in terms of expected optimistic deviation from the
average, rather than the average.
Small changes:
* Fix function starpu_mpi_wait_for_all()
* Fix building atomic functions with llvm on 32bit systems.
* SOCL: Fix missing CL_CALLBACK for various callback functions
* Update prologue function names for parallel workers
StarPU 1.4.0
==============================================
New features:
* Add a starpu_mpi_task_submit-oriented way of submitting MPI tasks
with functions starpu_mpi_task_exchange_data_before_execution() and
starpu_mpi_task_exchange_data_after_execution()
* Possibility to specify different directories to store performance
model files with new variable STARPU_PERF_MODEL_PATH
* Checkpoint mechanism for MPI applications
* Transaction support
* OpenMP LLVM support
* Driver for HIP-based GPUs.
* Fault tolerance support with starpu_task_ft_failed().
* Julia programming interface.
* Add get_max_size method to data interfaces for applications using data with
variable size to express their maximal potential size.
* New offline tool to draw graph showing elapsed time between sent
or received data and their use by tasks
* Add 4D tensor data interface.
* New sched_tasks.rec trace file which monitors task scheduling push/pop actions
* New STARPU_MPI_MEM_THROTTLE environment variable to throttle mpi
submission according to memory use.
* New number_events.data trace file which monitors number of events in trace
files. This file can be parsed by the new script
starpu_fxt_number_events_to_names.py to convert event keys to event names.
* New STARPU_PER_WORKER perfmodel.
* Add energy accounting in the simgrid mode: starpu_energy_use() and
starpu_energy_used().
* New function starpu_mpi_get_thread_cpuid() to know where is bound the MPI
thread.
* New function starpu_get_pu_os_index() to convert logical index of a PU to
its OS index.
* New function starpu_get_hwloc_topology() to get the hwloc topology used by
StarPU.
* Add a task prefetch level, to improve retaining data in accelerators so we
can make prefetch more aggressive.
* Add starpu_data_dup_ro().
* Add starpu_data_release_to() and starpu_data_release_to_on_node().
* Add profiling based on papi performance counters.
* Add an experimental python interface (not actually parallel yet)
* Add task submission file+line in traces.
* Add papi- and nvml-based energy measurement.
* Add starpu_mpi_datatype_node_register and
starpu_mpi_interface_datatype_node_register which will be needed for
MPI/NUMA/GPUDirect.
* Add peek_data interface method.
* Add support of dynamic broadcasts when StarPU-MPI is used with
NewMadeleine.
* New STARPU_MPI_RECV_WAIT_FINALIZE environment variable to wait
communication library completely releases the handle to unlock tasks
(instead of just releasing the write lock). Only for NewMadeleine.
* Add STARPU_MPI_REDUX
* New StarPU Java Bindings
* Add starpu_data_query_status2 function.
* Add starpu_data_evict_from_node function.
* Add a StarPU Eclipse Plugin
* Add support for Maxeler FPGA accelerators.
* Add 4D tensors filters.
* Add n-dimension data interface and filters.
* New STARPU_FXT_EVENTS environment variable to select at runtime which
event categories has to be recorded.
* Add support of mpi_sync_clocks for more precise distributed traces.
* Add more worker states in STARPU_PROFILING: callback, waiting, scheduling.
* Support for hierarchical tasks
* Support mapping memory between CPU RAM and GPU RAM, instead of copying
data.
* New function starpu_get_memory_location_bitmap() and register in traces on
which NUMA node are buffers used for MPI or tasks.
* TCP/IP-based master-slave support.
* Set STARPU_WORKERS_GETBIND to 1 by default, to inherit CPU binding from
the job scheduler.
* Add starpu_{vector,matrix,block,tensor,ndim}_filter_pick_variable.
* New operator for data interfaces pack_meta(), unpack_meta() and
free_meta() which are used in master slave mode for data
interfaces with a dynamic content.
* Add CUSOLVER support.
* Add STARPU_NOFOOTPRINT data access flag.
Small features:
* New function starpu_mpi_comm_register() to store the size and the
rank of the given communicator (update functions
starpu_mpi_comm_rank() and starpu_mpi_comm_size() to no longer
call directly the mpi functions)
* New configure option --with-check-cflags to define flags for C,
CXX and Fortran compilers
* FxT is now automatically enabled at build-time, but not enabled at
run-time by default any more, STARPU_FXT_TRACE needs to be explicitly set to
1 to enable FxT trace recording.
* Deprecate starpu_free() and add new function starpu_free_noflag()
to specify allocated size.
* Reuse matrix tiles that have different shapes but same allocation size.
* Add starpu_task_create_sync
* Add ram_colind/rowptr to csr and bcsr data interfaces. This allows to make
starpu_bcsr_filter_vertical_block work on several memory nodes.
* Add cuda0 and cuda1 example drivers.
* New STARPU_EXPECTED_TRANSFER_TIME_WRITEBACK environment variable to tune
transfer estimation times.
* Add tool starpu_config to display the configuration StarPU was
compiled with
* Possibility to enable data locality enforcement when choosing a
worker to run a task implementation
* New function starpu_data_partition_clean_node() to specify node on
which to gather data
* Move to the public API some scheduler utility functions
* New variable STARPU_SCHED_LIB to dynamically load a new scheduling
policy
* Enable GPUDirect when MPI supports it.
* Install a module file in lib/modules
* New function starpu_worker_wait_for_initialisation() which waits
for all workers to be initialised
* Add in the public API the codelet starpu_codelet_nop which has an
empty function defined for all drivers
* Add starpu_task_expected_length_average and
starpu_task_expected_energy_average.
* Add STARPU_SIMGRID_TASK_PUSH_COST environment variable.
* Add starpu_memory_nodes_get_count_by_kind and
starpu_memory_node_get_ids_by_type.
* Add STARPU_MPI_REDUX_ARITY_THRESHOLD to tune the type of tree used in
distributed-memory reduction patterns that are automatically detected.
* New function starpu_data_set_reduction_methods_with_args() to
specify arguments to pass to the reduction and init tasks
Changes:
* The redux codelet should expose the STARPU_COMMUTE flag, since StarPU
actually uses commutability.
* Rename STARPU_COMM_STATS environment variable to STARPU_MPI_STATS
* Function starpu_data_lookup has been removed, it is now up to the
calling code to manage a ptr-to-handle reverse lookup table when
needed.
* Cluster is renamed in parallel worker but keep the old API as
deprecated
* Removed pop_every_task scheduler method, unused since long.
Small changes:
* starpu_mpi_task_insert() returns -ENODEV if no worker is available
on the node which is to execute the codelet (the other nodes do
not return -ENODEV)
* Add a synthetic energy efficiency testcase.
* Make reduction methods want the commute flag.
* Delete old MIC driver code
* Rename
- starpu_conf::sched_policy_init to starpu_conf::sched_policy_callback
and
- starpu_sched_ctx_get_sched_policy_init() to starpu_sched_ctx_get_sched_policy_callback()
as the callback function may not only be used for init purposes
* Change the default value for configure option --enable-maxcpus to
auto. it allows StarPU to automatically use the number of CPUs
on the build machine.
* New option --worker for tool starpu_machine_display to only
display workers of a specific type
* Remove the unused and untested mpi_ms_funcs field.
* The home_node parameter of the register_data_handle method is turned from
unsigned to int, to explicit that it may be -1.
* Value 0 for STARPU_MPI_NDETACHED_SEND and STARPU_MPI_NREADY_PROCESS will
now disable their behaviour.
* Distributed-memory reduction patterns are automatically wrapped-up if the user
do not call starpu_mpi_redux_data()
* Remove starpu_data_pointer_is_inside().
StarPU 1.3.12
====================================================================
Small changes:
* Add starpu_data_deinitialize and starpu_data_deinitialize_submit
StarPU 1.3.11
====================================================================
Small changes:
* Fix building with cuda 12
StarPU 1.3.10
====================================================================
Small features:
* Add starpu_worker_get_current_task_exp_end.
Small changes:
* Change the default value for configure option --enable-maxcpus to
auto. it allows StarPU to automatically use the number of CPUs
on the build machine.
StarPU 1.3.9
====================================================================
Small changes:
* Add missing interface macros for BCSR data interface
StarPU 1.3.8
====================================================================
Small features:
* A codelet can now define a callback function pointer which will be
automatically called when the task does not define itself a
callback function, in that case, it can still be called from the
task callback function.
* New STARPU_WORKERS_COREID, STARPU_MAIN_THREAD_COREID and
STARPU_MPI_THREAD_COREID environment variables to bind threads to cores
instead of hyperthreads.
* New STARPU_TASK_PROGRESS environment variable to show task progression.
* Add STARPU_SIMGRID environment variable guard against native builds.
* Add starpu_cuda_get_nvmldev function.
* New configure option --with-check-cflags to define flags for C,
CXX and Fortran compilers
* Add starpu_sched_tree_deinitialize function.
* Add STARPU_SCHED_SORTED_ABOVE and STARPU_SCHED_SORTED_BELOW environment
variables.
* Add STARPU_SCHED_SIMPLE_PRE_DECISION.
* Add starpu_bcsr_filter_canonical_block_get_nchildren.
* Add unregister_data_handle handle ops.
StarPU 1.3.7
====================================================================
Small changes:
* Simgrid: bug fix for setting network/weight-S to 0.0
StarPU 1.3.6 (git revision fb9fbed81410d9f0ebbff5bdad1352df4705efe8)
====================================================================
Small features:
* New STARPU_BACKOFF_MIN and STARPU_BACKOFF_MAX environment variables to the
exponential backoff limits of the number of cycles to pause while drivers
are spinning.
* Add STARPU_DISPLAY_BINDINGS environment variable and
starpu_display_bindings() function to display all bindings on the machine by
calling hwloc-ps
* New function starpu_get_pu_os_index() to convert logical index of a PU to
its OS index.
* New function starpu_get_hwloc_topology() to get the hwloc topology used by
StarPU.
StarPU 1.3.5 (git revision 5f7458799f548026fab357b18541bb462dde2b53)
====================================================================
Small features:
* New environment variable STARPU_FXT_SUFFIX to set the filename in
which to save the fxt trace
* New option -d for starpu_fxt_tool to specify in which directory to
generate files
Small changes:
* Move MPI cache functions into the public API
* Add STARPU_MPI_NOBIND environment variable.
StarPU 1.3.4 (git revision c37a5d024cd997596da41f765557c58099baf896)
====================================================================
Small features:
* New environment variables STARPU_BUS_STATS_FILE and
STARPU_WORKER_STATS_FILE to specify files in which to display
statistics about data transfers and workers.
* Add starpu_bcsr_filter_vertical_block filtering function.
* Add starpu_interface_copy2d, 3d, and 4d to easily request data copies from
data interfaces.
* Move optimized cuda 2d copy from interfaces to new
starpu_cuda_copy2d_async_sync and starpu_cuda_copy3d_async_sync, and use
them from starpu_interface_copy2d and 3d.
* New function starpu_task_watchdog_set_hook to specify a function
to be called when the watchdog is raised
* Add STARPU_LIMIT_CPU_NUMA_MEM environment variable.
* Add STARPU_WORKERS_GETBIND environment variable.
* Add STARPU_SCHED_SIMPLE_DECIDE_ALWAYS modular scheduler flag.
* And STARPU_LIMIT_BANDWIDTH environment variable.
* Add field starpu_conf::precedence_over_environment_variables to ignore
environment variables when parameters are set directly in starpu_conf
* Add starpu_data_get_coordinates_array
* MPI: new functions starpu_mpi_interface_datatype_register() and
starpu_mpi_interface_datatype_unregister() which take a enum
starpu_data_interface_id instead of a starpu_data_handle_t
* New script starpu_env to set up StarPU environment variables
Small changes:
* New configure option --disable-build-doc-pdf
StarPU 1.3.3 (git revision 11afc5b007fe1ab1c729b55b47a5a98ef7f3cfad)
====================================================================
New features:
* New semantic for starpu_task_insert() and alike parameters
STARPU_CALLBACK_ARG, STARPU_PROLOGUE_CALLBACK_ARG, and
STARPU_PROLOGUE_CALLBACK_POP_ARG which set respectively
starpu_task::callback_arg_free,
starpu_task::prologue_callback_arg_free and
starpu_task::prologue_callback_pop_arg_free to 1 when used.
New parameters STARPU_CALLBACK_ARG_NFREE,
STARPU_CALLBACK_WITH_ARG_NFREE, STARPU_PROLOGUE_CALLBACK_ARG_NFREE, and
STARPU_PROLOGUE_CALLBACK_POP_ARG_NFREE which set the corresponding
fields of starpu_task to 0.
* starpufft: Support 3D.
* New modular-eager-prio scheduler.
* Add 'ready' heuristic to modular schedulers.
* New modular-heteroprio scheduler.
* Add STARPU_TASK_SCHED_DATA
* Add support for staging schedulers.
* New modular-heteroprio-heft scheduler.
* New dmdap "data-aware performance model (priority)" scheduler
Changes:
* Modification in the Native Fortran interface of the functions
fstarpu_mpi_task_insert, fstarpu_mpi_task_build and
fstarpu_mpi_task_post_build to only take 1 parameter being the MPI
communicator, the codelet and the various parameters for the task.
Small features:
* New starpu_task_insert() and alike parameter STARPU_TASK_WORKERIDS
allowing to set the fields starpu_task::workerids_len and
starpu_task::workerids
* New starpu_task_insert() and alike parameters
STARPU_SEQUENTIAL_CONSISTENCY, STARPU_TASK_NO_SUBMITORDER and
STARPU_TASK_PROFILING_INFO
* New function starpu_create_callback_task() which creates and
submits an empty task with the specified callback
* Use the S4U interface of Simgrid instead of xbt and MSG.
Small changes:
* Default modular worker queues to 2 tasks unless it's an heft
scheduler
* Separate out STATUS_SLEEPING_SCHEDULING state from
STATUS_SLEEPING state
When running the scheduler while being idle, workers do not go in
the STATUS_SCHEDULING state, so that that time is considered as
idle time instead of overhead.
StarPU 1.3.2 (git revision af22a20fc00a37addf3cc6506305f89feed940b0)
====================================================================
Small changes:
* Improve OpenMP support to detect the environment is valid before
launching OpenMP
* Delete old code (drivers gordon, scc, starpu-top, and plugin gcc)
and update authors file accordingly
* Add Heteroprio documentation (including a simple example)
* Add a progression hook, to be called when workers are idle, which
is used in the NewMadeleine implementation of StarPU-MPI to ensure
communications progress.
StarPU 1.3.1 (git revision 01949488b4f8e6fe26d2c200293b8aae5876b038)
====================================================================
Small features:
* Add starpu_filter_nparts_compute_chunk_size_and_offset helper.
* Add starpu_bcsr_filter_canonical_block_child_ops.
Small changes:
* Improve detection of NVML availability. Do not only check the
library is available, also check the compiled code can be run.
StarPU 1.3.0 (git revision 24ca83c6dbb102e1cfc41db3bb21c49662067062)
====================================================================
New features:
* New scheduler 'heteroprio' with heterogeneous priorities
* Support priorities for data transfers.
* Add support for multiple linear regression performance models
- Bump performance model file format version to 45.
* Add MPI Master-Slave support to use the cores of remote nodes. Use the
--enable-mpi-master-slave option to activate it.
* Add STARPU_CUDA_THREAD_PER_DEV environment variable to support driving all
GPUs from only one thread when almost all kernels are asynchronous.
* Add starpu_replay tool to replay tasks.rec files with Simgrid.
* Add experimental support of NUMA nodes. Use STARPU_USE_NUMA to activate it.
* Add a new set of functions to make Out-of-Core based on HDF5 Library.
* Add a new implementation of StarPU-MPI on top of NewMadeleine
* Add optional callbacks to notify an external resource manager
about workers going to sleep and waking up
* Add implicit support for asynchronous partition planning. This means one
does not need to call starpu_data_partition_submit() etc. explicitly any
more, StarPU will make the appropriate calls as needed.
* Add starpu_task_notify_ready_soon_register() to be notified when it is
determined when a task will be ready an estimated amount of time from now.
* New StarPU-MPI initialization function (starpu_mpi_init_conf())
which allows StarPU-MPI to manage reserving a core for the MPI thread, or
merging it with CPU driver 0.
* Add possibility to delay the termination of a task with the
functions starpu_task_end_dep_add() which specifies the number of
calls to the function starpu_task_end_dep_release() needed to
trigger the task termination, or with starpu_task_declare_end_deps_array()
and starpu_task_declare_end_deps() to just declare termination dependencies
between tasks.
* Add possibility to define the sequential consistency at the task level
for each handle used by the task.
* Add STARPU_SPECIFIC_NODE_LOCAL, STARPU_SPECIFIC_NODE_CPU, and
STARPU_SPECIFIC_NODE_SLOW as generic values for codelet specific memory
nodes which can be used instead of exact node numbers.
* Add starpu_get_next_bindid() and starpu_bind_thread_on() to allow
binding an application-started thread on a free core. Use it in
StarPU-MPI to automatically bind the MPI thread on an available core.
* Add STARPU_RESERVE_NCPU environment variable and
starpu_config::reserve_ncpus field to make StarPU use a few cores
less.
* Add STARPU_MAIN_THREAD_BIND environment variable to make StarPU reserve a
core for the main thread.
* New StarPU-RM resource management module to share processor cores and
accelerator devices with other parallel runtime systems. Use
--enable-starpurm option to activate it.
* New schedulers modular-gemm, modular-pheft, modular-prandom and
modular-prandom-prio
* Add STARPU_MATRIX_SET_NX/NY/LD and STARPU_VECTOR_SET_NX to change a matrix
tile or vector size without reallocating the buffer.
* Application can change the allocation used by StarPU with
starpu_malloc_set_hooks()
* XML output for starpu_perfmodel_display and starpu_perfmodel_dump_xml()
function
Small features:
* Scheduling contexts may now be associated a user data pointer at creation
time, that can later be recalled through starpu_sched_ctx_get_user_data().
* New environment variables STARPU_SIMGRID_TASK_SUBMIT_COST and
STARPU_SIMGRID_FETCHING_INPUT_COST to simulate the cost of task
submission and data fetching in simgrid mode.
This provides more accurate simgrid predictions, especially for the
beginning of the execution and regarding data transfers.
* New environment variable STARPU_SIMGRID_SCHED_COST to take into
account the time to perform scheduling when running in SimGrid mode.
* New configure option --enable-mpi-pedantic-isend (disabled by
default) to acquire data in STARPU_RW (instead of STARPU_R) before
performing MPI_Isend() call
* New function starpu_worker_display_names() to display the names of
all the workers of a specified type.
* Arbiters now support concurrent read access.
* Add a field starpu_task::where similar to starpu_codelet::where
which allows to restrict where to execute a task. Also add
STARPU_TASK_WHERE to be used when calling starpu_task_insert().
* Add SubmitOrder trace field.
* Add workerids and workerids_len task fields.
* Add priority management to StarPU-MPI. Can be disabled with
the STARPU_MPI_PRIORITIES environment variable.
* Add STARPU_MAIN_THREAD_CPUID and STARPU_MPI_THREAD_CPUID environment
variables.
* Add disk to disk copy functions and support asynchronous full read/write
in disk backends.
* New starpu_task_insert() parameter STARPU_CL_ARGS_NFREE which allows
to set codelet parameters but without freeing them.
* New starpu_task_insert() parameter STARPU_TASK_DEPS_ARRAY which
allows to declare task dependencies similarly to
starpu_task_declare_deps_array()
* Add dependency backward information in debugging mode for gdb's
starpu-print-task
* Add sched_data field in starpu_task structure.
* New starpu_fxt_tool option -label-deps to label dependencies on
the output graph
* New environment variable STARPU_GENERATE_TRACE_OPTIONS to specify
fxt options (to be used with STARPU_GENERATE_TRACE)
* New function starpu_task_set() similar as starpu_task_build() but
with a task object given as the first parameter
* New functions
starpu_data_partition_submit_sequential_consistency() and
starpu_data_unpartition_submit_sequential_consistency()
* Add a new value STARPU_TASK_SYNCHRONOUS to be used in
starpu_task_insert() to define if the task is (or not) synchronous
* Add memory states events in the traces.
* Add starpu_sched_component_estimated_end_min_add() to fix termination
estimations in modular schedulers.
* New function starpu_data_partition_not_automatic() to disable the
automatic partitioning of a data handle for which a asynchronous
plan has previously been submitted
* Add starpu_task_declare_deps()
* New function starpu_data_unpartition_submit_sequential_consistency_cb()
to specify a callback for the task submitting the unpartitioning
* New tool starpu_mpi_comm_trace.py to draw heatmap of MPI
communications
* Support for ARM performance libraries
* Add functionality to disable signal catching either through field
starpu_conf::catch_signals or through the environment variable
STARPU_CATCH_SIGNALS
* Support for OpenMP Taskloop directive
* Optional data interface init function (used by the vector and
matrix interfaces)
Changes:
* Vastly improve simgrid simulation time.
* Switch default scheduler to lws.
* Add "to" parameter to pull_task and can_push methods of
components.
* Deprecate starpu_data_interface_ops::handle_to_pointer interface
operation in favor of new starpu_data_interface_ops::to_pointer
operation.
* Sort data access requests by priority.
* Cluster support is disabled by default, unless the configure
option --enable-cluster is specified
* For unpack operations, move the memory deallocation from
starpu_data_unpack() to the interface function
starpu_data_interface_ops::unpack_data(). Pack and unpack
functions of predefined interfaces
use public API starpu_malloc_on_node_flags() and
starpu_free_on_node_flags() to allocate and de-allocate memory
Small changes:
* Use asynchronous transfers for task data fetches with were not prefetched.
* Allow to call starpu_sched_ctx_set_policy_data on the main
scheduler context
* Function starpu_is_initialized() is moved to the public API.
* Fix code to allow to submit tasks to empty contexts
* STARPU_COMM_STATS also displays the bandwidth
* Update data interfaces implementations to only use public API
StarPU 1.2.11 (git revision xxx)
====================================================================
Small features:
* Add starpu_tag_notify_restart_from_apps().
StarPU 1.2.10 (git revision beb6ac9cc07dc9ae1c838a38d11ed2dae3775996)
====================================================================
Small features:
* New script starpu_env to set up StarPU environment variables
* New configure option --disable-build-doc-pdf
StarPU 1.2.9 (git revision 3aca8da3138a99e93d7f93905d2543bd6f1ea1df)
====================================================================
Small changes:
* Add STARPU_SIMGRID_TRANSFER_COST environment variable to easily disable
data transfer costs.
* New dmdap "data-aware performance model (priority)" scheduler
* Modification in the Native Fortran interface of the functions
fstarpu_mpi_task_insert, fstarpu_mpi_task_build and
fstarpu_mpi_task_post_build to only take 1 parameter being the MPI
communicator, the codelet and the various parameters for the task.
StarPU 1.2.8 (git revision f66374c9ad39aefb7cf5dfc31f9ab3d756bcdc3c)
====================================================================
Small features:
* Minor fixes
StarPU 1.2.7 (git revision 07cb7533c22958a76351bec002955f0e2818c530)
====================================================================
Small features:
* Add STARPU_HWLOC_INPUT environment variable to save initialization time.
* Add starpu_data_set/get_ooc_flag.
* Use starpu_mpi_tag_t (int64_t) for MPI communication tag
StarPU 1.2.6 (git revision 23049adea01837479f309a75c002dacd16eb34ad)
====================================================================
Small changes:
* Fix crash for lws scheduler
* Avoid making hwloc load PCI topology when CUDA is not enabled
StarPU 1.2.5 (git revision 22f32916916d158e3420033aa160854d1dd341bd)
====================================================================
Small features:
* Add a new value STARPU_TASK_COLOR to be used in
starpu_task_insert() to pick up the color of a task in dag.dot
* Add starpu_data_pointer_is_inside().
Changes:
* Do not export -lcuda -lcudart -lOpenCL in *starpu*.pc.
StarPU 1.2.4 (git revision 255cf98175ef462749780f30bfed21452b74b594)
====================================================================
Small features:
* Catch of signals SIGINT and SIGSEGV to dump fxt trace files.
* New configure option --disable-icc to disable the compilation of
specific ICC examples
* Add starpu_codelet_pack_arg_init, starpu_codelet_pack_arg,
starpu_codelet_pack_arg_fini for more fine-grain packing capabilities.
* Add starpu_task_insert_data_make_room,
starpu_task_insert_data_process_arg,
starpu_task_insert_data_process_array_arg,
starpu_task_insert_data_process_mode_array_arg
* Do not show internal tasks in fxt dag by default. Allow to hide
acquisitions too.
* Add a way to choose the dag.dot colors.
StarPU 1.2.3 (git revision 586ba6452a8eef99f275c891ce08933ae542c6c2)
====================================================================
New features:
* Add per-node MPI data.
Small features:
* When debug is enabled, starpu data accessors first check the
validity of the data interface type
* Print disk bus performances when STARPU_BUS_STATS is set
* Add starpu_vector_filter_list_long filter.
* Data interfaces now define a name through the struct starpu_data_interface_ops
* StarPU-MPI :
- allow predefined data interface not to define a mpi datatype and
to be exchanged through pack/unpack operations
- New function starpu_mpi_comm_get_attr() which allows to return
the value of the attribute STARPU_MPI_TAG_UB, i.e the upper
bound for tag value.
- New configure option enable-mpi-verbose to manage the display of
extra MPI debug messages.
* Add STARPU_WATCHDOG_DELAY environment variable.
* Add a 'waiting' worker status
* Allow new value 'extra' for configure option --enable-verbose
Small changes:
* Add data_unregister event in traces
* StarPU-MPI
- push detached requests at the back of the testing list, so they
are tested last since they will most probably finish latest
* Automatically initialize handles on data acquisition when
reduction methods are provided, and make sure a handle is
initialized before trying to read it.
StarPU 1.2.2 (git revision a0b01437b7b91f33fb3ca36bdea35271cad34464)
===================================================================
New features:
* Add starpu_data_acquire_try and starpu_data_acquire_on_node_try.
* Add NVCC_CC environment variable.
* Add -no-flops and -no-events options to starpu_fxt_tool to make
traces lighter
* Add starpu_cusparse_init/shutdown/get_local_handle for proper CUDA
overlapping with cusparse.
* Allow precise debugging by setting STARPU_TASK_BREAK_ON_PUSH,
STARPU_TASK_BREAK_ON_SCHED, STARPU_TASK_BREAK_ON_POP, and
STARPU_TASK_BREAK_ON_EXEC environment variables, with the job_id
of a task. StarPU will raise SIGTRAP when the task is being
scheduled, pushed, or popped by the scheduler.
Small features:
* New function starpu_worker_get_job_id(struct starpu_task *task)
which returns the job identifier for a given task
* Show package/numa topology in starpu_machine_display
* MPI: Add mpi communications in dag.dot
* Add STARPU_PERF_MODEL_HOMOGENEOUS_CPU environment variable to
allow having one perfmodel per CPU core
* Add starpu_perfmodel_arch_comb_fetch function.
* Add starpu_mpi_get_data_on_all_nodes_detached function.
Small changes:
* Output generated through STARPU_MPI_COMM has been modified to
allow easier automated checking
* MPI: Fix reactivity of the beginning of the application, when a
lot of ready requests have to be processed at the same time, we
want to poll the pending requests from time to time.
* MPI: Fix gantt chart for starpu_mpi_irecv: it should use the
termination time of the request, not the submission time.
* MPI: Modify output generated through STARPU_MPI_COMM to allow
easier automated checking
* MPI: enable more tests in simgrid mode
* Use assumed-size instead of assumed-shape arrays for native
fortran API, for better backward compatibility.
* Fix odd ordering of CPU workers on CPUs due to GPUs stealing some
cores
StarPU 1.2.1 (git revision 473acaec8a1fb4f4c73d8b868e4f044b736b41ea)
====================================================================
New features:
* Add starpu_fxt_trace_user_event_string.
* Add starpu_tasks_rec_complete tool to add estimation times in tasks.rec
files.
* Add STARPU_FXT_TRACE environment variable.
* Add starpu_data_set_user_data and starpu_data_get_user_data.
* Add STARPU_MPI_FAKE_SIZE and STARPU_MPI_FAKE_RANK to allow simulating
execution of just one MPI node.
* Add STARPU_PERF_MODEL_HOMOGENEOUS_CUDA/OPENCL/MIC/SCC to share performance
models between devices, making calibration much faster.
* Add modular-heft-prio scheduler.
* Add starpu_cublas_get_local_handle helper.
* Add starpu_data_set_name, starpu_data_set_coordinates_array, and
starpu_data_set_coordinates to describe data, and starpu_iteration_push and
starpu_iteration_pop to describe tasks, for better offline traces analysis.
* New function starpu_bus_print_filenames() to display filenames
storing bandwidth/affinity/latency information, available through
tools/starpu_machine_display -i
* Add support for Ayudame version 2.x debugging library.
* Add starpu_sched_ctx_get_workers_list_raw, much less costly than
starpu_sched_ctx_get_workers_list
* Add starpu_task_get_name and use it to warn about dmda etc. using
a dumb policy when calibration is not finished
* MPI: Add functions to test for cached values
Changes:
* Fix performance regression of lws for small tasks.
* Improve native Fortran support for StarPU
Small changes:
* Fix type of data home node to allow users to pass -1 to define
temporary data
* Fix compatibility with simgrid 3.14
StarPU 1.2.0 (git revision 5a86e9b61cd01b7797e18956283cc6ea22adfe11)
====================================================================
New features:
* MIC Xeon Phi support
* SCC support
* New function starpu_sched_ctx_exec_parallel_code to execute a
parallel code on the workers of the given scheduler context
* MPI:
- New internal communication system : a unique tag called
is now used for all communications, and a system
of hashmaps on each node which stores pending receives has been
implemented. Every message is now coupled with an envelope, sent
before the corresponding data, which allows the receiver to
allocate data correctly, and to submit the matching receive of
the envelope.
- New function
starpu_mpi_irecv_detached_sequential_consistency which
allows to enable or disable the sequential consistency for
the given data handle (sequential consistency will be
enabled or disabled based on the value of the function
parameter and the value of the sequential consistency
defined for the given data)
- New functions starpu_mpi_task_build() and
starpu_mpi_task_post_build()
- New flag STARPU_NODE_SELECTION_POLICY to specify a policy for
selecting a node to execute the codelet when several nodes
own data in W mode.
- New selection node policies can be un/registered with the
functions starpu_mpi_node_selection_register_policy() and
starpu_mpi_node_selection_unregister_policy()
- New environment variable STARPU_MPI_COMM which enables
basic tracing of communications.
- New function starpu_mpi_init_comm() which allows to specify
a MPI communicator.
* New STARPU_COMMUTE flag which can be passed along STARPU_W or STARPU_RW to
let starpu commute write accesses.
* Out-of-core support, through registration of disk areas as additional memory
nodes. It can be enabled programmatically or through the STARPU_DISK_SWAP*
environment variables.
* Reclaiming is now periodically done before memory becomes full. This can
be controlled through the STARPU_*_AVAILABLE_MEM environment variables.
* New hierarchical schedulers which allow the user to easily build
its own scheduler, by coding itself each "box" it wants, or by
combining existing boxes in StarPU to build it. Hierarchical
schedulers have very interesting scalability properties.
* Add STARPU_CUDA_ASYNC and STARPU_OPENCL_ASYNC flags to allow asynchronous
CUDA and OpenCL kernel execution.
* Add STARPU_CUDA_PIPELINE and STARPU_OPENCL_PIPELINE to specify how
many asynchronous tasks are submitted in advance on CUDA and
OpenCL devices. Setting the value to 0 forces a synchronous
execution of all tasks.
* Add CUDA concurrent kernel execution support through
the STARPU_NWORKER_PER_CUDA environment variable.
* Add CUDA and OpenCL kernel submission pipelining, to overlap costs and allow
concurrent kernel execution on Fermi cards.
* New locality work stealing scheduler (lws).
* Add STARPU_VARIABLE_NBUFFERS to be set in cl.nbuffers, and nbuffers and
modes field to the task structure, which permit to define codelets taking a
variable number of data.
* Add support for implementing OpenMP runtimes on top of StarPU
* New performance model format to better represent parallel tasks.
Used to provide estimations for the execution times of the
parallel tasks on scheduling contexts or combined workers.
* starpu_data_idle_prefetch_on_node and
starpu_idle_prefetch_task_input_on_node allow to queue prefetches to be done
only when the bus is idle.
* Make starpu_data_prefetch_on_node not forcibly flush data out, introduce
starpu_data_fetch_on_node for that.
* Add data access arbiters, to improve parallelism of concurrent data
accesses, notably with STARPU_COMMUTE.
* Anticipative writeback, to flush dirty data asynchronously before the
GPU device is full. Disabled by default. Use STARPU_MINIMUM_CLEAN_BUFFERS
and STARPU_TARGET_CLEAN_BUFFERS to enable it.
* Add starpu_data_wont_use to advise that a piece of data will not be used
in the close future.
* Enable anticipative writeback by default.
* New scheduler 'dmdasd' that considers priority when deciding on
which worker to schedule
* Add the capability to define specific MPI datatypes for
StarPU user-defined interfaces.
* Add tasks.rec trace output to make scheduling analysis easier.
* Add Fortran 90 module and example using it
* New StarPU-MPI gdb debug functions
* Generate animated html trace of modular schedulers.
* Add asynchronous partition planning. It only supports coherency through
the home node of data for now.
* Add STARPU_MALLOC_SIMULATION_FOLDED flag to save memory when simulating.
* Include application threads in the trace.
* Add starpu_task_get_task_scheduled_succs to get successors of a task.
* Add graph inspection facility for schedulers.
* New STARPU_LOCALITY flag to mark data which should be taken into account
by schedulers for improving locality.
* Experimental support for data locality in ws and lws.
* Add a preliminary framework for native Fortran support for StarPU
Small features:
* Tasks can now have a name (via the field const char *name of
struct starpu_task)
* New functions starpu_data_acquire_cb_sequential_consistency() and
starpu_data_acquire_on_node_cb_sequential_consistency() which allows
to enable or disable sequential consistency
* New configure option --enable-fxt-lock which enables additional
trace events focused on locks behaviour during the execution
* Functions starpu_insert_task and starpu_mpi_insert_task are
renamed in starpu_task_insert and starpu_mpi_task_insert. Old
names are kept to avoid breaking old codes.
* New configure option --enable-calibration-heuristic which allows
the user to set the maximum authorized deviation of the
history-based calibrator.
* Allow application to provide the task footprint itself.
* New function starpu_sched_ctx_display_workers() to display worker
information belonging to a given scheduler context
* The option --enable-verbose can be called with
--enable-verbose=extra to increase the verbosity
* Add codelet size, footprint and tag id in the paje trace.
* Add STARPU_TAG_ONLY, to specify a tag for traces without making StarPU
manage the tag.
* On Linux x86, spinlocks now block after a hundred tries. This avoids
typical 10ms pauses when the application thread tries to submit tasks.
* New function char *starpu_worker_get_type_as_string(enum starpu_worker_archtype type)
* Improve static scheduling by adding support for specifying the task
execution order.
* Add starpu_worker_can_execute_task_impl and
starpu_worker_can_execute_task_first_impl to optimize getting the
working implementations
* Add STARPU_MALLOC_NORECLAIM flag to allocate without running a reclaim if
the node is out of memory.
* New flag STARPU_DATA_MODE_ARRAY for the function family
starpu_task_insert to allow to define a array of data handles
along with their access modes.
* New configure option --enable-new-check to enable new testcases
which are known to fail
* Add starpu_memory_allocate and _deallocate to let the application declare
its own allocation to the reclaiming engine.
* Add STARPU_SIMGRID_CUDA_MALLOC_COST and STARPU_SIMGRID_CUDA_QUEUE_COST to
disable CUDA costs simulation in simgrid mode.
* Add starpu_task_get_task_succs to get the list of children of a given
task.
* Add starpu_malloc_on_node_flags, starpu_free_on_node_flags, and
starpu_malloc_on_node_set_default_flags to control the allocation flags
used for allocations done by starpu.
* Ranges can be provided in STARPU_WORKERS_CPUID
* Add starpu_fxt_autostart_profiling to be able to avoid autostart.
* Add arch_cost_function perfmodel function field.
* Add STARPU_TASK_BREAK_ON_SCHED, STARPU_TASK_BREAK_ON_PUSH, and
STARPU_TASK_BREAK_ON_POP environment variables to debug schedulers.
* Add starpu_sched_display tool.
* Add starpu_memory_pin and starpu_memory_unpin to pin memory allocated
another way than starpu_malloc.
* Add STARPU_NOWHERE to create synchronization tasks with data.
* Document how to switch between different views of the same data.
* Add STARPU_NAME to specify a task name from a starpu_task_insert call.
* Add configure option to disable fortran --disable-fortran
* Add configure option to give path for smpirun executable --with-smpirun
* Add configure option to disable the build of tests --disable-build-tests
* Add starpu-all-tasks debugging support
* New function
void starpu_opencl_load_program_source_malloc(const char *source_file_name, char **located_file_name, char **located_dir_name, char **opencl_program_source)
which allocates the pointers located_file_name, located_dir_name
and opencl_program_source.
* Add submit_hook and do_schedule scheduler methods.
* Add starpu_sleep.
* Add starpu_task_list_ismember.
* Add _starpu_fifo_pop_this_task.
* Add STARPU_MAX_MEMORY_USE environment variable.
* Add starpu_worker_get_id_check().
* New function starpu_mpi_wait_for_all(MPI_Comm comm) that allows to
wait until all StarPU tasks and communications for the given
communicator are completed.
* New function starpu_codelet_unpack_args_and_copyleft() which
allows to copy in a new buffer values which have not been unpacked by
the current call
* Add STARPU_CODELET_SIMGRID_EXECUTE flag.
* Add STARPU_CODELET_SIMGRID_EXECUTE_AND_INJECT flag.
* Add STARPU_CL_ARGS flag to starpu_task_insert() and
starpu_mpi_task_insert() functions call
Changes:
* Data interfaces (variable, vector, matrix and block) now define
pack und unpack functions
* StarPU-MPI: Fix for being able to receive data which have not yet
been registered by the application (i.e it did not call
starpu_data_set_tag(), data are received as a raw memory)
* StarPU-MPI: Fix for being able to receive data with the same tag
from several nodes (see mpi/tests/gather.c)
* Remove the long-deprecated cost_model fields and task->buffers field.
* Fix complexity of implicit task/data dependency, from quadratic to linear.
Small changes:
* Rename function starpu_trace_user_event() as
starpu_fxt_trace_user_event()
* "power" is renamed into "energy" wherever it applies, notably energy
consumption performance models
* Update starpu_task_build() to set starpu_task::cl_arg_free to 1 if
some arguments of type ::STARPU_VALUE are given.
* Simplify performance model loading API