INDEX
Explanations
phrases related to providing context or reasoning
phrases indicating conditional or causal relationships
New Auto-Interp
Negative Logits
YING
-0.73
eware
-0.72
unless
-0.71
ievers
-0.68
iband
-0.68
fits
-0.68
arter
-0.66
_-
-0.66
Bind
-0.65
thinking
-0.65
POSITIVE LOGITS
circumstances
1.27
nature
1.17
recent
1.16
seriousness
1.11
popularity
1.11
propensity
1.10
proximity
1.07
severity
1.05
history
1.04
prevalence
1.04
Activations Density 0.153%