INDEX
Explanations
nothing, Not, nowhere, nada, rien
New Auto-Interp
Negative Logits
aro
-0.10
abol
-0.09
okers
-0.08
inker
-0.08
wsz
-0.08
isha
-0.08
ousse
-0.08
uti
-0.08
nage
-0.08
oker
-0.08
POSITIVE LOGITS
nothing
0.94
Nothing
0.77
nothing
0.76
NOTHING
0.72
Nothing
0.70
nichts
0.64
nada
0.62
rien
0.56
ниÑĩего
0.54
nulla
0.39
Activations Density 0.211%