INDEX
Explanations
adjectives indicating quantity or degree such as "much," "a lot," and "so much."
New Auto-Interp
Negative Logits
ares
-0.85
aves
-0.84
breakers
-0.79
Ts
-0.75
buster
-0.75
ends
-0.73
runs
-0.73
save
-0.73
agents
-0.72
iques
-0.71
POSITIVE LOGITS
overlap
1.17
evidence
1.05
disagreement
1.04
confusion
1.03
similarity
1.03
uncertainty
1.00
discrepancy
0.99
ambiguity
0.98
possibility
0.98
indication
0.97
Activations Density 0.136%