INDEX
Explanations
phrases that contrast or emphasize a difference
phrases indicating cause and effect relationships
New Auto-Interp
Negative Logits
Heath
-0.64
Python
-0.63
ory
-0.61
Scene
-0.57
oft
-0.56
ct
-0.55
Orlando
-0.54
¶
-0.54
Babel
-0.52
Mex
-0.52
POSITIVE LOGITS
etheless
1.17
nonetheless
1.05
nevertheless
0.82
interstitial
0.70
downright
0.68
darn
0.68
incre
0.67
conclud
0.66
ankind
0.66
damn
0.64
Activations Density 0.159%