INDEX
Explanations
references to dishonesty and falsehoods in political discourse
New Auto-Interp
Negative Logits
ÑĪки
-0.17
iba
-0.15
ÑĤоÑĤ
-0.14
еÑģи
-0.14
åº
-0.14
Stub
-0.14
ÑĢоÑī
-0.14
755
-0.14
_hooks
-0.13
uctor
-0.13
POSITIVE LOGITS
Perkins
0.14
CJK
0.14
ae
0.14
Q
0.14
dex
0.14
_gettime
0.13
Oaks
0.13
atto
0.13
porr
0.13
inconvenience
0.13
Activations Density 0.148%