INDEX
Explanations
expressions of opinions or feelings about individuals or events
New Auto-Interp
Negative Logits
erno
-0.16
lic
-0.16
enberg
-0.15
wer
-0.15
allegedly
-0.15
Ñĩем
-0.14
enci
-0.14
bout
-0.14
ency
-0.13
imu
-0.13
POSITIVE LOGITS
æĺ¯ä¸Ģ个
0.38
æĺ¯ä¸ª
0.36
æĺ¯ä¸Ģ
0.30
sebuah
0.26
an
0.23
ä¸Ģ个
0.21
—a
0.21
eine
0.20
ä¸Ģç§į
0.20
een
0.20
Activations Density 0.218%