INDEX
Explanations
statements that challenge the truthfulness or credibility of various claims and accusations
New Auto-Interp
Negative Logits
clearfix
-0.16
-CN
-0.15
removeAttr
-0.15
olle
-0.14
_managed
-0.14
chie
-0.14
heid
-0.13
otte
-0.13
managed
-0.13
aco
-0.13
POSITIVE LOGITS
valid
0.40
accurate
0.39
correct
0.36
-valid
0.34
valid
0.34
accuracy
0.33
valide
0.33
validity
0.32
Valid
0.31
.valid
0.30
Activations Density 0.257%