INDEX
Explanations
apostrophes in words, particularly in possessive forms and contractions
New Auto-Interp
Negative Logits
ulet
-0.16
ÙĦس
-0.16
Ãłng
-0.15
æľŃ
-0.15
ungs
-0.15
.secret
-0.15
outers
-0.14
ÑģÑĤ
-0.14
ple
-0.14
ernes
-0.14
POSITIVE LOGITS
ourke
0.30
ullivan
0.28
Sullivan
0.25
Connor
0.24
Connell
0.23
Shea
0.23
Ri
0.22
Conor
0.21
Sha
0.21
hea
0.21
Activations Density 0.008%