INDEX
Explanations
claims or statements that are unsupported or unsubstantiated
New Auto-Interp
Negative Logits
actionGroup
-0.90
ktop
-0.76
isites
-0.74
itaire
-0.72
ivation
-0.72
ahime
-0.70
mobility
-0.69
ivating
-0.69
rontal
-0.68
itect
-0.68
POSITIVE LOGITS
debunked
1.30
falsehood
1.23
assertions
1.20
False
1.18
Claim
1.14
debunk
1.13
untrue
1.13
misinformation
1.13
baseless
1.11
claims
1.10
Activations Density 0.587%