INDEX
Negative Logits
Efq
-1.22
doubtnut
-1.19
pleaſure
-1.19
myſelf
-1.16
Monfieur
-1.16
Jefus
-1.13
$_"
-1.11
Anſ
-1.08
^(@)
-1.08
Houſe
-1.07
POSITIVE LOGITS
,
0.66
in
0.62
O
0.58
↵↵
0.57
↵
0.57
.
0.56
ur
0.55
no
0.54
(
0.53
med
0.53
Activations Density 0.025%