INDEX
Explanations
decisions or actions indicating a negative choice or outcome
negative expressions or phrases indicating opposition or rejection
New Auto-Interp
Negative Logits
verbs
-0.71
initialized
-0.67
dashed
-0.67
perfect
-0.67
Weak
-0.67
illiter
-0.64
\\\\\\\\
-0.64
Basics
-0.63
Perfect
-0.62
debunked
-0.62
POSITIVE LOGITS
renew
1.46
bother
1.27
pursue
1.21
participate
1.14
anymore
1.04
proceed
1.04
attend
1.02
continue
0.98
bud
0.96
tolerate
0.96
Activations Density 0.213%