INDEX
Explanations
phrases indicating ownership or possession
New Auto-Interp
Negative Logits
itſelf
-1.78
myſelf
-1.70
pleaſure
-1.70
Efq
-1.69
houſe
-1.62
ſelf
-1.61
faſt
-1.61
ſtate
-1.59
ſelves
-1.58
purpoſe
-1.57
POSITIVE LOGITS
a
1.10
1.07
and
1.07
<eos>
1.05
in
1.05
an
1.05
is
1.05
the
1.05
I
1.04
for
0.94
Activations Density 3.562%