INDEX
Negative Logits
ta
-0.73
Li
-0.69
Mar
-0.69
di
-0.69
pos
-0.69
Ben
-0.68
var
-0.68
Gor
-0.68
J
-0.68
si
-0.68
POSITIVE LOGITS
itſelf
1.63
myſelf
1.52
Monfieur
1.45
Majefty
1.45
Efq
1.41
houſe
1.38
Houſe
1.38
themſelves
1.34
pleaſure
1.32
himſelf
1.30
Activations Density 0.088%