INDEX
Explanations
the word "won't"
negations or expressions of refusal
New Auto-Interp
Negative Logits
Kings
-0.66
Must
-0.66
Communities
-0.65
Liter
-0.65
ancer
-0.65
Intern
-0.65
Strategy
-0.65
Aren
-0.65
Measures
-0.64
Pure
-0.63
POSITIVE LOGITS
necessarily
1.22
bud
1.02
bother
1.02
icably
1.00
be
0.92
spoil
0.89
hesitate
0.89
tolerate
0.87
exactly
0.87
icable
0.81
Activations Density 0.050%