INDEX
Explanations
information related to factual statements and claims
New Auto-Interp
Negative Logits
surla
-0.79
BagConstraints
-0.54
rlpool
-0.50
⤹
-0.49
EconPapers
-0.48
explique
-0.48
Normdatei
-0.47
enciaga
-0.46
endphp
-0.45
VersionUID
-0.45
POSITIVE LOGITS
fact
0.73
fact
0.69
FACT
0.66
FACT
0.63
Fact
0.59
facts
0.58
facts
0.58
Fact
0.57
Facts
0.54
truth
0.51
Activations Density 1.743%