INDEX
Explanations
mentions of the word "them" to indicate an emphasis on a specific group or entity
New Auto-Interp
Negative Logits
Racine
-0.89
purpoſe
-0.79
cauſe
-0.75
Monfieur
-0.72
uſe
-0.71
Efq
-0.71
pleaſure
-0.70
noel
-0.68
HRS
-0.68
Conci
-0.68
POSITIVE LOGITS
themselves
1.48
Them
1.29
Them
1.26
themselves
1.25
them
1.17
selves
1.17
they
1.06
THEM
1.05
him
1.01
THEY
0.98
Activations Density 0.037%