INDEX
Explanations
phrases indicating a lack of action or helplessness
New Auto-Interp
Negative Logits
whoever
-0.17
somewhere
-0.17
å±ħ
-0.15
ÑĥÑģÑĸ
-0.15
991
-0.14
whichever
-0.14
Whoever
-0.14
FTER
-0.14
zel
-0.14
withd
-0.14
POSITIVE LOGITS
nothing
1.05
nothing
0.94
Nothing
0.93
NOTHING
0.88
Nothing
0.85
nada
0.72
nichts
0.68
rien
0.67
ниÑĩего
0.55
anything
0.50
Activations Density 0.298%