INDEX
Explanations
evidence or arguments supporting or refuting different ideas or claims
phrases that support or refute popular perceptions or notions
New Auto-Interp
Negative Logits
actionGroup
-0.79
ufact
-0.71
Browse
-0.71
unfocusedRange
-0.70
GOODMAN
-0.70
Events
-0.69
overcame
-0.68
phabet
-0.67
··
-0.66
meet
-0.65
POSITIVE LOGITS
assertion
1.69
assertions
1.66
claim
1.65
notion
1.61
hypothesis
1.60
claims
1.59
hypotheses
1.55
assumption
1.53
allegation
1.47
theories
1.47
Activations Density 0.444%