INDEX
Explanations
pronouns referring to people, particularly in the context of interactions or relationships
New Auto-Interp
Negative Logits
houſe
-1.17
Houſe
-1.10
cauſe
-1.09
Athenian
-1.05
raiſ
-1.05
Charlemagne
-1.04
pleaſure
-1.04
purpoſe
-1.03
ytale
-1.03
Weyl
-1.00
POSITIVE LOGITS
us
0.97
them
0.82
us
0.82
self
0.80
him
0.79
Him
0.76
ness
0.75
</em>
0.74
as
0.74
Jim
0.74
Activations Density 0.090%