INDEX
Explanations
visual descriptions of how things appear
phrases indicating perception or appearance
New Auto-Interp
Negative Logits
kson
-0.77
wu
-0.73
âĹ¼
-0.73
verend
-0.71
venient
-0.70
essional
-0.70
iling
-0.70
learning
-0.69
umbn
-0.69
ricular
-0.69
POSITIVE LOGITS
suspic
0.76
ahead
0.71
bones
0.69
ynt
0.69
ãĤ¶
0.68
shif
0.68
unbeat
0.67
noses
0.64
awfully
0.64
suspicious
0.64
Activations Density 0.058%