INDEX
Explanations
statements relating to learning, knowledge, feelings, and attitudes towards various subjects
expressions of learning, emotions, and desires
New Auto-Interp
Negative Logits
su
-0.72
press
-0.69
eli
-0.68
Attempt
-0.67
stru
-0.66
weed
-0.66
cele
-0.65
cephal
-0.64
oug
-0.64
intend
-0.63
POSITIVE LOGITS
theirs
0.83
them
0.83
something
0.82
THEM
0.78
hers
0.77
nothing
0.74
ļé
0.74
lots
0.73
everything
0.73
plenty
0.71
Activations Density 0.372%