INDEX
Negative Logits
itſelf
-0.92
houſe
-0.90
Cæsar
-0.88
Efq
-0.85
Houſe
-0.82
dentaire
-0.82
ſta
-0.82
pleaſure
-0.80
purpoſe
-0.77
giapp
-0.77
POSITIVE LOGITS
sen
0.60
entile
0.60
ting
0.55
phalt
0.53
ness
0.52
sing
0.52
ging
0.51
ture
0.51
ses
0.48
ders
0.47
Activations Density 0.270%