INDEX
Explanations
contractions common in English such as "don't" or "didn't"
negations related to necessity or obligation
New Auto-Interp
Negative Logits
Adin
-0.67
forms
-0.66
integrity
-0.63
Intern
-0.60
Integrity
-0.59
Strikes
-0.57
affinity
-0.57
heartedly
-0.57
Alas
-0.57
Fu
-0.55
POSITIVE LOGITS
need
1.07
hear
0.96
gotta
0.92
wanna
0.90
NEED
0.88
expect
0.85
yourselves
0.83
want
0.83
yourself
0.81
necessarily
0.81
Activations Density 0.123%