INDEX
Explanations
statements and claims regarding truthfulness and accuracy in various contexts
New Auto-Interp
Negative Logits
clearfix
-0.15
chie
-0.15
Fraud
-0.15
ocom
-0.14
klar
-0.14
obot
-0.14
icas
-0.14
Remed
-0.14
Cheat
-0.13
onet
-0.13
POSITIVE LOGITS
accurate
0.41
accuracy
0.39
correct
0.39
true
0.35
accur
0.34
Accuracy
0.34
accur
0.31
accuracy
0.31
true
0.31
truth
0.30
Activations Density 0.187%