INDEX
Explanations
words related to unconfirmed or disputed claims
New Auto-Interp
Negative Logits
adan
-0.91
utics
-0.79
oton
-0.77
ctors
-0.77
lished
-0.75
uden
-0.74
atever
-0.74
cair
-0.74
ertodd
-0.74
perature
-0.72
POSITIVE LOGITS
inability
0.96
impossibility
0.94
contradiction
0.90
culprit
0.88
absence
0.86
lack
0.86
failings
0.83
contradictions
0.83
perpetrator
0.81
innocence
0.80
Activations Density 0.126%