INDEX
Explanations
scientific and analytical terms related to hypotheses, theories, assumptions, and arguments
assertions or hypotheses that relate to social theories and misconceptions
New Auto-Interp
Negative Logits
aths
-0.75
iencies
-0.72
Chains
-0.72
quished
-0.72
downtime
-0.71
curfew
-0.70
orers
-0.70
Nanto
-0.70
dule
-0.68
Kens
-0.67
POSITIVE LOGITS
unfounded
1.15
baseless
1.13
valid
1.09
refuted
1.01
echoed
1.01
nonsense
1.00
false
1.00
untrue
0.99
debunked
0.97
incorrect
0.97
Activations Density 0.460%