INDEX
Explanations
negations using "will not be."
the phrase "will not be" or similar variations indicating negation or refusal
New Auto-Interp
Negative Logits
strous
-0.67
guise
-0.64
rower
-0.63
traverse
-0.62
Might
-0.61
anos
-0.61
srfAttach
-0.60
hail
-0.59
clarify
-0.58
compose
-0.58
POSITIVE LOGITS
able
1.19
anymore
1.11
bothered
1.09
counted
0.93
necessarily
0.88
tolerated
0.87
anywhere
0.84
entirely
0.84
allowed
0.84
remotely
0.83
Activations Density 0.121%