INDEX
Explanations
references to belief and perception in various contexts
New Auto-Interp
Negative Logits
itura
-0.17
#=
-0.16
ebe
-0.15
754
-0.15
ysi
-0.15
.Match
-0.15
osomal
-0.15
strup
-0.14
icina
-0.14
ordo
-0.14
POSITIVE LOGITS
mate
0.18
BUF
0.17
æĦı
0.16
idel
0.16
hire
0.15
alyze
0.15
ãĢħ
0.14
ÏĨο
0.14
ä¹İ
0.14
harm
0.14
Activations Density 0.005%