INDEX
Explanations
discussions about truth, false claims, and accusations
True/false statements
identifying false statements
New Auto-Interp
Negative Logits
AndEndTag
-0.69
виправивши
-0.68
OGND
-0.68
EconPapers
-0.67
>",
-0.67
LookAnd
-0.59
transpa
-0.57
]**
-0.55
})));
-0.55
ProgressDialog
-0.54
POSITIVE LOGITS
untrue
1.39
false
1.16
false
1.13
incorrect
1.12
FALSE
1.06
False
1.03
False
1.03
falso
1.01
falsehood
1.00
FALSE
1.00
Activations Density 0.280%