INDEX
Explanations
instances of reported speech or quotations
New Auto-Interp
Negative Logits
hus
-0.16
idlo
-0.15
elop
-0.15
aram
-0.15
antan
-0.15
------+------+
-0.14
.mas
-0.14
ìĨĮëħĦ
-0.14
üz
-0.14
ãĥ³ãĤ¹
-0.14
POSITIVE LOGITS
yb
0.17
reporters
0.16
ghi
0.15
ivar
0.15
ÙĴس
0.14
us
0.14
OUNDS
0.14
ahr
0.14
oui
0.14
ousel
0.14
Activations Density 0.029%