INDEX
Explanations
expressions of personal beliefs and emotions
New Auto-Interp
Negative Logits
emoc
-0.16
adera
-0.15
udu
-0.15
رÛĮز
-0.15
noop
-0.15
rys
-0.15
.slim
-0.14
VISION
-0.14
.nlm
-0.14
ύ
-0.14
POSITIVE LOGITS
him
0.30
ihn
0.26
lui
0.22
Him
0.22
그를
0.21
onun
0.20
ihm
0.20
HIM
0.18
onu
0.17
عÙĨÙĩ
0.17
Activations Density 0.376%