INDEX
Explanations
phrases indicating uncertainty or conditionality
phrases that convey conditional or uncertain relationships
New Auto-Interp
Negative Logits
fw
-0.78
iry
-0.71
banks
-0.69
bah
-0.68
iti
-0.68
ults
-0.68
bern
-0.67
DF
-0.67
oru
-0.65
met
-0.65
POSITIVE LOGITS
necessarily
0.81
equate
0.74
anymore
0.73
correlate
0.69
correlated
0.69
portrayed
0.68
confined
0.68
indicative
0.68
advers
0.68
entail
0.68
Activations Density 0.033%