INDEX
Negative Logits
poffe
-0.91
myſelf
-0.90
Efq
-0.89
itſelf
-0.88
reaſon
-0.81
poffible
-0.79
themſelves
-0.78
Majefty
-0.78
ſtand
-0.78
pleaſure
-0.77
POSITIVE LOGITS
s
0.90
ers
0.74
ron
0.69
ry
0.66
ual
0.62
sun
0.61
r
0.60
ings
0.60
ative
0.59
son
0.58
Activations Density 0.189%