INDEX
Negative Logits
purpoſe
-1.05
ſelves
-0.85
Theſe
-0.85
Houſe
-0.84
Efq
-0.83
Lycka
-0.83
pleaſure
-0.80
NDEBUG
-0.80
fhew
-0.79
Majefty
-0.79
POSITIVE LOGITS
.
0.74
for
0.71
that
0.70
because
0.62
such
0.60
like
0.60
caused
0.59
!
0.59
called
0.59
known
0.59
Activations Density 0.028%