INDEX
Explanations
references to political events and sports outcomes
New Auto-Interp
Negative Logits
eton
-0.14
ncy
-0.14
bone
-0.14
omal
-0.13
uida
-0.13
raid
-0.13
dược
-0.13
mmc
-0.13
.pres
-0.13
uria
-0.13
POSITIVE LOGITS
score
0.18
ÑĢаÑħ
0.18
nil
0.18
ware
0.18
Score
0.17
ÑĢаÑħÑĥнок
0.17
WARE
0.17
-nil
0.16
ÑģÑĩеÑĤ
0.16
rames
0.15
Activations Density 0.139%