INDEX
Explanations
phrases or words related to falsehoods, lies, or deception
the occurrence of the term "false" in various contexts
New Auto-Interp
Negative Logits
hens
-0.94
guiActiveUnfocused
-0.81
hem
-0.78
xual
-0.77
rador
-0.73
shed
-0.72
onen
-0.71
icans
-0.71
oké
-0.71
nesota
-0.70
POSITIVE LOGITS
positives
1.24
dich
1.03
equival
0.98
guiIcon
0.96
accusation
0.95
alarms
0.94
negatives
0.91
guiActiveUn
0.89
allegation
0.85
accusations
0.84
Activations Density 0.034%