INDEX
Negative Logits
Programming
-0.08
comfy
-0.08
terlihat
-0.08
eleration
-0.08
hasa
-0.08
enschaft
-0.08
.hour
-0.08
vrucht
-0.08
[...,
-0.08
(*.
-0.08
POSITIVE LOGITS
losses
0.14
theft
0.14
चोरी
0.13
pérdidas
0.13
loss
0.12
pertes
0.11
盗
0.11
thieves
0.11
Theft
0.11
Loss
0.11
Activations Density 0.053%