INDEX
Explanations
phrases expressing feelings or comparisons using "like."
New Auto-Interp
Negative Logits
myſelf
-1.04
itſelf
-1.03
ſtate
-0.98
ThroughAttribute
-0.96
Majefty
-0.94
Efq
-0.93
themſelves
-0.92
fhew
-0.90
himſelf
-0.89
purpoſe
-0.85
POSITIVE LOGITS
a
1.09
the
1.05
it
0.87
someone
0.85
an
0.85
they
0.78
something
0.71
that
0.69
there
0.67
EconPapers
0.66
Activations Density 0.057%