INDEX
Explanations
instances of deception or falsehood in various contexts
New Auto-Interp
Negative Logits
GEBURTSDATUM
-0.73
.
-0.69
↵↵
-0.65
↵
-0.63
!
-0.62
5
-0.62
(
-0.57
?
-0.56
olsun
-0.56
3
-0.56
POSITIVE LOGITS
eſ
0.91
YourGuide
0.90
ſſer
0.89
getF
0.86
]='\
0.85
uests
0.85
ghijklmnop
0.84
IBLIO
0.84
arangay
0.84
itsubishi
0.84
Activations Density 1.017%