INDEX
Explanations
deceptive representations and claims in political contexts
New Auto-Interp
Negative Logits
mund
-0.52
Introduced
-0.50
otomy
-0.49
Reserv
-0.48
dess
-0.47
Teks
-0.46
BEAUTY
-0.46
kao
-0.46
introduce
-0.46
BACKUP
-0.45
POSITIVE LOGITS
utafitiHapana
0.65
falsely
0.65
المعيارى
0.65
تانيه
0.65
ConstraintMaker
0.63
RegressionTest
0.61
DockStyle
0.58
findpost
0.58
참고
0.56
AssemblyCulture
0.56
Activations Density 0.300%