INDEX
Explanations
contradictions between different pieces of information or statements
statements and phrases that express contradiction or refutation
New Auto-Interp
Negative Logits
emetery
-0.79
NetMessage
-0.76
anny
-0.75
ussed
-0.71
actionGroup
-0.67
itizens
-0.66
Eye
-0.64
ammy
-0.64
umbered
-0.62
hene
-0.62
POSITIVE LOGITS
corrobor
0.98
contradict
0.97
substant
0.93
dispro
0.89
debunk
0.84
assertions
0.84
refute
0.83
hypotheses
0.78
debunked
0.76
unfounded
0.76
Activations Density 0.047%