INDEX
Explanations
expressions of emotional conflict or introspection
New Auto-Interp
Negative Logits
awah
-0.19
erras
-0.17
Grim
-0.14
ikh
-0.14
ystick
-0.14
YY
-0.14
alion
-0.13
alim
-0.13
εξ
-0.13
ura
-0.13
POSITIVE LOGITS
baby
0.15
ahn
0.15
Erg
0.15
ainen
0.14
Every
0.14
оÑĢÑĤ
0.14
ici
0.14
erg
0.14
icit
0.14
Hog
0.13
Activations Density 0.016%