INDEX
Explanations
references to "mind" or related concepts
New Auto-Interp
Negative Logits
EZ
-0.16
ritz
-0.15
bett
-0.15
lication
-0.15
izzly
-0.14
akedown
-0.14
ost
-0.14
antino
-0.14
aylor
-0.14
eyer
-0.14
POSITIVE LOGITS
fulness
0.29
lessly
0.26
ustry
0.26
fully
0.26
sets
0.25
/body
0.23
FUL
0.23
fuck
0.23
-num
0.23
meld
0.21
Activations Density 0.009%