INDEX
Explanations
mentions of names, particularly first names
Names followed by last names/initials
names and surnames
New Auto-Interp
Negative Logits
-0.56
.
-0.55
,
-0.53
(
-0.52
↵↵
-0.51
"
-0.51
“
-0.49
:
-0.48
m
-0.46
'
-0.45
POSITIVE LOGITS
Monfieur
1.45
Houſe
1.42
Jefus
1.41
myſelf
1.35
Anſ
1.34
Theſe
1.34
houſe
1.32
Efq
1.31
Shakspeare
1.29
Reſ
1.29
Activations Density 0.144%