Skip to content

Commit

Permalink
add kernel 4 to docs. have to improve these docs more and document th…
Browse files Browse the repository at this point in the history
…em better
  • Loading branch information
karpathy committed May 2, 2024
1 parent 2feb9ff commit 2202c9a
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion dev/cuda/classifier_fused.cu
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ nvcc -O3 --use_fast_math classifier_fused.cu -o classifier_fused
./classifier_fused 1
./classifier_fused 2
./classifier_fused 3
./classifier_fused 4
*/

#include <stdio.h>
Expand Down Expand Up @@ -448,7 +449,7 @@ __global__ void fused_classifier_kernel4(float* dlogits, float* losses, float* p
// calculate the probability needed for the loss and update (single-threaded)
if(threadIdx.x == 0) {
float prob = expf(logits[idx * P + ix] - sp.Offset) * sp.Scale;
losses[idx] = -logf(prob);
losses[idx] = -logf(prob);
}

// very sensible default for dlosses is 1/(B*T), which is the uniform loss
Expand Down

0 comments on commit 2202c9a

Please sign in to comment.