Skip to content

Commit

Permalink
update benchmark results
Browse files Browse the repository at this point in the history
  • Loading branch information
salykova committed Nov 16, 2024
1 parent 4d7b9a7 commit 6fa5c50
Show file tree
Hide file tree
Showing 5 changed files with 126 additions and 120 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
# Fast, Multi-threaded Matrix Multiplication in C from Scratch
# High-Performance FP32 Matrix Multiplication on CPU

> **Important note:** Please don’t expect peak performance without fine-tuning hyperparameters such as the *number of threads, kernel size and block sizes*, unless you're running it on a Ryzen 7700(X). The current implementation includes a single kernel and a parallelization strategy, both optimized for AMD Zen CPUs. For manycore processors (> 16 cores), consider utilizing nested parallelism and parallelizing 2-3 loops to increase the performance (e.g., the 5th, 3rd, and 2nd loops around the kernel). More on this in the [tutorial](https://salykova.github.io/matmul-cpu).
## Key Features
- Step by step, beginner-friendly [tutorial](https://salykova.github.io/matmul-cpu)
- Performance comparable to OpenBLAS and MKL
- Simple and scalable C code
- Supports arbitrary matrix sizes
- Faster than OpenBLAS and MKL on Ryzen 7700
- Efficiently parallelized with just 3 lines of OpenMP directives
- Targets x86 processors with AVX2 and FMA3 instructions (=all modern Intel Core and AMD Ryzen CPUs)
- Efficiently parallelized with 3 lines of OpenMP directives
- Targets x86 processors with AVX2 and FMA3 instructions
- Follows the [BLIS](https://github.com/flame/blis) design
- Step by step, beginner-friendly [tutorial](https://salykova.github.io/matmul-cpu)

## Installation
Install the following packages via `apt` if you are using a Debian-based Linux distribution
Expand Down Expand Up @@ -40,7 +40,7 @@ Test enviroment:
- CPU LOCKED CLOCK SPEED: 4.5GHz
- RAM: 32GB DDR5 6000 MHz CL36
- OpenBLAS v.0.3.26
- MKL v2023.1
- MKL 2023.1
- Compiler: GCC 11.4.0
- OS: Ubuntu 22.04.4 LTS

Expand Down
Binary file modified assets/perf.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
78 changes: 40 additions & 38 deletions assets/ryzen7700_perf/benchmark_matmul.txt
Original file line number Diff line number Diff line change
@@ -1,38 +1,40 @@
200 4 730 493
400 146 980 936
600 444 1022 1001
800 586 1062 1043
1000 867 1068 1061
1200 672 1077 1054
1400 772 1069 1051
1600 985 1077 1070
1800 795 1059 1045
2000 936 1065 1053
2200 1019 1050 1045
2400 928 1059 1045
2600 883 1052 1034
2800 1036 1060 1052
3000 1035 1058 1053
3200 1024 1062 1057
3400 1040 1061 1057
3600 1015 1070 1059
3800 1018 1070 1058
4000 1027 1077 1069
4200 1034 1067 1062
4400 1049 1074 1068
4600 1023 1075 1060
4800 1045 1077 1067
5000 1057 1077 1073
5200 1049 1077 1069
5400 1059 1077 1071
5600 1058 1079 1076
5800 1071 1080 1077
6000 1054 1085 1081
6200 1069 1078 1075
6400 1056 1084 1076
6600 1056 1081 1076
6800 1080 1085 1083
7000 1070 1082 1078
7200 1065 1085 1078
7400 1069 1082 1079
7600 1072 1085 1081
200 15 734 615
400 150 981 935
600 337 1026 1006
800 785 1060 1047
1000 761 1069 1055
1200 863 1077 1065
1400 938 1069 1061
1600 1012 1076 1071
1800 941 1064 1054
2000 968 1071 1057
2200 969 1050 1039
2400 967 1060 1042
2600 1021 1054 1049
2800 986 1066 1057
3000 1036 1068 1063
3200 1017 1065 1059
3400 978 1061 1048
3600 1054 1071 1067
3800 1055 1070 1065
4000 1065 1077 1073
4200 1056 1069 1065
4400 1068 1076 1074
4600 1041 1072 1065
4800 1071 1082 1078
5000 1069 1079 1075
5200 1056 1077 1072
5400 1069 1075 1073
5600 1075 1080 1078
5800 1068 1080 1075
6000 1073 1085 1081
6200 1056 1076 1071
6400 1075 1082 1080
6600 1075 1081 1079
6800 1074 1085 1081
7000 1077 1081 1079
7200 1077 1082 1080
7400 1076 1081 1079
7600 1071 1082 1081
7800 1067 1076 1073
8000 1074 1079 1077
78 changes: 40 additions & 38 deletions assets/ryzen7700_perf/benchmark_mkl.txt
Original file line number Diff line number Diff line change
@@ -1,38 +1,40 @@
200 9 591 529
400 700 813 803
600 784 907 884
800 886 950 939
1000 884 948 944
1200 798 988 979
1400 867 988 963
1600 856 976 956
1800 890 951 933
2000 893 935 928
2200 877 967 935
2400 865 974 964
2600 736 977 866
2800 919 961 954
3000 771 977 871
3200 897 931 917
3400 724 740 731
3600 915 941 924
3800 721 732 727
4000 924 941 930
4200 716 724 719
4400 922 934 927
4600 712 719 715
4800 909 925 914
5000 715 720 717
5200 911 925 919
5400 710 715 713
5600 925 943 933
5800 721 725 723
6000 892 943 935
6200 714 721 718
6400 907 918 914
6600 723 727 725
6800 934 943 939
7000 717 719 718
7200 938 948 943
7400 723 727 725
7600 942 951 947
200 152 600 555
400 653 816 780
600 757 895 871
800 796 941 921
1000 896 952 949
1200 940 993 978
1400 908 990 973
1600 914 986 970
1800 904 963 951
2000 946 963 960
2200 890 976 957
2400 893 982 940
2600 813 986 915
2800 762 838 803
3000 756 975 854
3200 750 960 869
3400 735 757 747
3600 947 965 959
3800 727 748 742
4000 952 969 961
4200 726 739 734
4400 937 971 958
4600 717 728 725
4800 936 961 944
5000 715 726 721
5200 933 944 939
5400 716 721 718
5600 940 960 951
5800 721 725 724
6000 927 957 948
6200 717 721 719
6400 926 939 932
6600 724 729 727
6800 950 964 955
7000 720 723 722
7200 954 965 959
7400 725 728 727
7600 956 968 961
7800 725 727 726
8000 949 959 952
78 changes: 40 additions & 38 deletions assets/ryzen7700_perf/benchmark_openblas.txt
Original file line number Diff line number Diff line change
@@ -1,38 +1,40 @@
200 52 543 514
400 81 788 718
600 143 698 658
800 336 897 852
1000 496 835 824
1200 828 931 924
1400 916 984 979
1600 908 997 990
1800 991 1061 1056
2000 942 991 984
2200 980 1052 1047
2400 965 996 990
2600 862 1041 1031
2800 931 1013 999
3000 1005 1038 1031
3200 1065 1078 1074
3400 1025 1035 1033
3600 1049 1064 1057
3800 982 1027 1020
4000 1024 1056 1048
4200 968 1034 1021
4400 1031 1044 1039
4600 1042 1053 1049
4800 1011 1046 1028
5000 1067 1069 1068
5200 1028 1051 1046
5400 1044 1079 1073
5600 1030 1048 1042
5800 1041 1069 1062
6000 1041 1059 1057
6200 1056 1071 1068
6400 1081 1092 1089
6600 1058 1067 1065
6800 1069 1085 1080
7000 1049 1063 1060
7200 1063 1079 1073
7400 1046 1062 1058
7600 1049 1073 1068
200 40 495 464
400 106 765 721
600 451 854 825
800 760 934 920
1000 856 947 925
1200 903 1005 1000
1400 983 1009 998
1600 960 1057 1054
1800 989 1044 1036
2000 1012 1052 1050
2200 1018 1057 1051
2400 1013 1045 1030
2600 1024 1049 1044
2800 1027 1053 1037
3000 1048 1071 1067
3200 1035 1065 1055
3400 1039 1057 1051
3600 1042 1066 1058
3800 1043 1053 1050
4000 1032 1065 1053
4200 1032 1068 1058
4400 1059 1074 1068
4600 1041 1062 1055
4800 1059 1069 1065
5000 1050 1057 1053
5200 1062 1080 1075
5400 1066 1076 1072
5600 1063 1074 1065
5800 1055 1074 1068
6000 1061 1076 1070
6200 1041 1070 1059
6400 1059 1071 1068
6600 1064 1071 1067
6800 1067 1079 1075
7000 1060 1072 1063
7200 1069 1071 1070
7400 1059 1064 1063
7600 1060 1070 1068
7800 1060 1070 1064
8000 1055 1074 1067

0 comments on commit 6fa5c50

Please sign in to comment.