INDEX
Negative Logits
Fare
-0.08
Further
-0.07
forbidden
-0.07
Her
-0.07
_LOCAL
-0.07
Moy
-0.06
"↵↵
-0.06
متح
-0.06
questo
-0.06
breathing
-0.06
POSITIVE LOGITS
RA
0.07
_DA
0.07
Ä
0.07
GA
0.06
also
0.06
ha
0.06
iP
0.06
(...
0.06
ีค
0.06
naj
0.06
Activations Density 0.100%