INDEX
Explanations
phrases indicating intentions or actions to assist others
New Auto-Interp
Negative Logits
odd
-0.16
ixe
-0.15
ellar
-0.15
izzo
-0.14
asto
-0.14
积
-0.14
ksam
-0.14
axon
-0.14
zá
-0.14
(éĩij
-0.14
POSITIVE LOGITS
764
0.16
缼
0.15
rien
0.14
ecd
0.14
(er
0.14
raki
0.14
³
0.14
964
0.13
syndrome
0.13
Syndrome
0.13
Activations Density 0.011%