INDEX
Explanations
instances of the word "sh" followed by varying contexts, indicating a focus on shock or surprise expressions
New Auto-Interp
Negative Logits
strup
-0.09
stru
-0.08
hend
-0.08
arend
-0.07
mpar
-0.07
imbus
-0.07
.resp
-0.07
wend
-0.07
iaux
-0.07
ysa
-0.07
POSITIVE LOGITS
es
0.07
warm
0.07
sh
0.06
Bones
0.06
sh
0.06
allow
0.06
ales
0.06
Force
0.06
hybrid
0.06
coun
0.05
Activations Density 0.009%