INDEX
Explanations
references to an individual named Paul
New Auto-Interp
Negative Logits
pleaſure
-1.31
iſt
-1.19
purpoſe
-1.17
ſeveral
-1.15
ſche
-1.13
myſelf
-1.12
Anſ
-1.11
leaſt
-1.10
ſta
-1.10
ſelf
-1.10
POSITIVE LOGITS
Paul
2.66
Paul
2.43
PAUL
2.05
paul
1.96
PAUL
1.82
paul
1.69
Paulson
1.14
ポール
1.14
Paulus
1.13
Paulo
0.89
Activations Density 0.094%