INDEX
Explanations
verbs related to movement or progression
various themes related to challenges and obstacles
New Auto-Interp
Negative Logits
pc
-0.57
razil
-0.55
aughed
-0.51
didnt
-0.49
uld
-0.48
ori
-0.48
ĨĴ
-0.47
emale
-0.46
HW
-0.46
doesnt
-0.46
POSITIVE LOGITS
.
1.07
.—
0.94
.</
0.91
—
0.90
.):
0.88
.?
0.88
—
0.87
.)
0.86
.*
0.84
.''.
0.84
Activations Density 1.128%