INDEX
Explanations
mentions of feelings and sensory experiences
New Auto-Interp
Negative Logits
arcy
-0.17
orthy
-0.16
adem
-0.13
itm
-0.13
uet
-0.13
otty
-0.13
olik
-0.13
unner
-0.13
ë¡Ŀ
-0.13
okes
-0.13
POSITIVE LOGITS
somehow
0.27
inexp
0.20
eled
0.18
indef
0.17
peculiar
0.17
sanki
0.17
myster
0.16
Somehow
0.16
sort
0.15
perhaps
0.15
Activations Density 0.213%