INDEX
Explanations
phrases indicating support or refutation of ideas or claims
discussions related to debunking myths or unsupported claims
New Auto-Interp
Negative Logits
ufact
-0.85
actionGroup
-0.73
unfocusedRange
-0.67
itaire
-0.67
rha
-0.66
ophon
-0.66
icult
-0.65
oided
-0.65
ishable
-0.64
Suc
-0.64
POSITIVE LOGITS
assertions
1.62
claims
1.56
claim
1.48
assertion
1.47
hypotheses
1.42
accusations
1.37
notion
1.35
allegation
1.35
assumptions
1.35
allegations
1.33
Activations Density 0.573%