INDEX
Explanations
phrases that emphasize existence or presence
New Auto-Interp
Negative Logits
McCart
-0.17
Mastery
-0.16
ä¾ĭ
-0.15
aucoup
-0.15
lon
-0.15
elman
-0.14
silent
-0.14
Guy
-0.14
Restricted
-0.14
Silence
-0.14
POSITIVE LOGITS
no
0.39
nothing
0.29
nobody
0.28
no
0.26
keine
0.24
,no
0.24
geen
0.23
nowhere
0.23
nothing
0.23
kein
0.22
Activations Density 0.076%