INDEX
Explanations
the presence of governmental or authoritative entities in a text
New Auto-Interp
Negative Logits
defgroup
-0.16
ãĥ¼ãĤ¯
-0.15
ech
-0.14
νο
-0.14
è·
-0.14
("%-0.13
sám
-0.13
à¸ķา
-0.13
Shapes
-0.13
Alt
-0.13
POSITIVE LOGITS
ingers
0.16
opor
0.15
themselves
0.15
们
0.15
šak
0.14
inger
0.14
uma
0.14
875
0.14
tesy
0.14
enberg
0.14
Activations Density 0.141%