INDEX
Explanations
conjunctions and relational language indicating connections or associations
New Auto-Interp
Negative Logits
Monfieur
-1.38
houſe
-1.33
Efq
-1.31
purpoſe
-1.28
ſtand
-1.28
myſelf
-1.26
auffi
-1.26
Vidite
-1.26
pleaſure
-1.25
itſelf
-1.24
POSITIVE LOGITS
<eos>
0.77
0.75
I
0.72
</i>
0.71
P
0.69
E
0.65
↵↵
0.65
ap
0.64
F
0.64
a
0.63
Activations Density 0.842%