INDEX
Explanations
references to specific political figures or public personalities
New Auto-Interp
Negative Logits
zew
-0.19
dete
-0.16
SES
-0.16
вай
-0.15
кеÑĤ
-0.15
ses
-0.15
Readable
-0.15
éŀ
-0.15
aktu
-0.14
CUS
-0.14
POSITIVE LOGITS
indent
0.16
Indent
0.16
éİ®
0.15
indent
0.15
Tmax
0.15
-indent
0.14
Sha
0.14
æ½
0.14
873
0.14
ubbo
0.14
Activations Density 0.003%