INDEX
Negative Logits
d
-0.70
POPULAR
-0.64
popular
-0.61
m
-0.61
s
-0.58
atrix
-0.58
enko
-0.57
populares
-0.57
t
-0.56
n
-0.54
POSITIVE LOGITS
pleaſure
0.95
uſe
0.89
houſe
0.87
Diſ
0.87
purpoſe
0.87
Anſ
0.86
raiſ
0.85
Reſ
0.85
poffe
0.82
Majefty
0.81
Activations Density 0.130%