INDEX
Negative Logits
Anſ
-1.17
pleaſure
-1.10
Monfieur
-1.05
Reſ
-1.02
Diſ
-0.99
Theſe
-0.98
iſt
-0.97
ſy
-0.97
Houſe
-0.96
ſta
-0.94
POSITIVE LOGITS
Paul
1.57
Paul
1.34
Paulson
1.30
PAUL
1.20
Paulus
1.13
PAUL
1.12
Paulina
1.06
paul
1.01
paul
1.00
Paula
0.96
Activations Density 0.005%