INDEX
Explanations
warnings or advice given in text
punctuation and expressions of caution or warnings
New Auto-Interp
Negative Logits
pires
-0.85
izable
-0.63
itud
-0.62
disparate
-0.62
differed
-0.57
matched
-0.57
contrasts
-0.57
progressed
-0.56
paralle
-0.56
manifested
-0.56
POSITIVE LOGITS
Otherwise
1.19
Otherwise
0.99
lest
0.98
unless
0.95
THEY
0.89
Especially
0.88
otherwise
0.88
Seriously
0.85
Unless
0.85
because
0.83
Activations Density 0.400%