INDEX
Explanations
references to political events or figures
following names or locations
New Auto-Interp
Negative Logits
utafitiHapana
-0.47
rungsseite
-0.47
sional
-0.46
aspect
-0.45
著
-0.45
destroyAll
-0.43
Normdatei
-0.43
lair
-0.42
avajillas
-0.41
vosť
-0.41
POSITIVE LOGITS
among
1.00
among
0.97
和其他
0.96
amongst
0.95
AMONG
0.89
joined
0.83
parmi
0.81
Amongst
0.81
joins
0.80
وغير
0.80
Activations Density 0.401%