INDEX
Explanations
assertions regarding the truthfulness or falsehood of statements
New Auto-Interp
Negative Logits
ब्रेकडाउन
-0.52
InputBorder
-0.47
attempts
-0.45
Portail
-0.43
jahre
-0.41
ItemBackground
-0.40
енча
-0.40
AntiForgeryToken
-0.39
mbolos
-0.39
pautas
-0.39
POSITIVE LOGITS
untrue
0.75
truth
0.75
TRUTH
0.69
truth
0.68
Truth
0.66
FALSE
0.65
Truth
0.64
truths
0.63
FALSE
0.59
TRUE
0.59
Activations Density 0.931%