INDEX
Explanations
arguments and claims related to misconceptions, falsehoods, and controversies
New Auto-Interp
Negative Logits
dos
-0.78
artney
-0.77
ktop
-0.75
Interstitial
-0.74
ophon
-0.68
actionGroup
-0.67
incinn
-0.66
zie
-0.65
mes
-0.64
eteria
-0.64
POSITIVE LOGITS
assertions
1.15
assertion
1.13
debunked
1.12
belief
1.06
icist
1.06
validity
1.04
assumptions
1.03
untrue
1.02
falsehood
0.99
debunk
0.97
Activations Density 5.079%