INDEX
Explanations
references to a quantity
Follows "of" and precedes a pronoun
of [determiner/pronoun]
New Auto-Interp
Negative Logits
pleaſure
-0.88
purpoſe
-0.86
itſelf
-0.86
himſelf
-0.85
raiſ
-0.82
fhort
-0.82
reaſon
-0.80
Houſe
-0.79
cauſe
-0.79
themſelves
-0.77
POSITIVE LOGITS
us
0.83
the
0.79
these
0.70
them
0.64
it
0.64
their
0.63
his
0.61
its
0.60
those
0.57
our
0.57
Activations Density 0.194%