INDEX
Explanations
words related to contradictions
references to contradictions and inconsistencies
New Auto-Interp
Negative Logits
undai
-0.88
otin
-0.87
rol
-0.85
obbies
-0.83
odes
-0.81
asted
-0.79
gone
-0.78
thel
-0.75
iatric
-0.75
aldi
-0.74
POSITIVE LOGITS
contradiction
1.04
xual
1.03
hift
0.81
paradox
0.80
contradictory
0.79
inconsistency
0.77
naire
0.76
contradictions
0.75
juxtap
0.75
contradict
0.73
Activations Density 0.011%