INDEX
Explanations
references to governmental entities and officials
New Auto-Interp
Negative Logits
cop
-0.16
imes
-0.16
ÑĥÑī
-0.16
odb
-0.15
izen
-0.15
enser
-0.15
ibaba
-0.14
ìĹĶ
-0.14
ifers
-0.14
rox
-0.14
POSITIVE LOGITS
amental
0.20
ador
0.20
-General
0.19
ance
0.19
à¥įह
0.19
ments
0.17
bodies
0.17
aret
0.17
atore
0.17
-general
0.17
Activations Density 0.020%