INDEX
Explanations
phrases indicating legal or political contexts
New Auto-Interp
Negative Logits
overall
-0.15
also
-0.15
403
-0.14
later
-0.14
Of
-0.14
μη
-0.14
ping
-0.13
later
-0.13
út
-0.13
ãĤ¡
-0.13
POSITIVE LOGITS
Ñĥков
0.15
-Nazi
0.14
GMT
0.14
oller
0.14
oga
0.14
dın
0.13
ãĥ¼ãĥĦ
0.13
ISIBLE
0.13
steder
0.13
ÏģÏī
0.13
Activations Density 0.064%