INDEX
Explanations
phrases relating to personal experience and emotional expression
New Auto-Interp
Negative Logits
zew
-0.17
esh
-0.17
regnum
-0.14
overall
-0.14
alto
-0.14
åı¥
-0.14
altogether
-0.13
ãģ¾ãģ¾
-0.13
itself
-0.13
ito
-0.13
POSITIVE LOGITS
ÙħÛĮÙĦادÛĮ
0.19
/on
0.15
eward
0.15
gnore
0.15
Bowen
0.15
itals
0.14
ürk
0.14
ghan
0.14
iaux
0.14
ë²Ī
0.14
Activations Density 0.668%