INDEX
Explanations
phrases indicating inaction or failure to act
New Auto-Interp
Negative Logits
asca
-0.19
.mvp
-0.15
oplan
-0.14
lur
-0.14
uliar
-0.14
919
-0.14
PressEvent
-0.14
nnen
-0.14
573
-0.14
Truy
-0.14
POSITIVE LOGITS
nothing
0.90
nothing
0.79
Nothing
0.79
NOTHING
0.75
Nothing
0.72
nada
0.65
nichts
0.59
rien
0.57
ниÑĩего
0.50
nulla
0.44
Activations Density 0.255%