INDEX
Explanations
references to "them" or "him" indicating a focus on relationships or interactions with others
New Auto-Interp
Negative Logits
للاسماء
-0.63
juſ
-0.62
pleaſure
-0.61
Diſ
-0.59
auroit
-0.56
Majefty
-0.54
panto
-0.54
becauſe
-0.54
anſ
-0.53
loue
-0.53
POSITIVE LOGITS
them
1.70
them
1.36
Them
1.33
Them
1.30
THEM
1.23
him
0.98
us
0.86
них
0.84
eux
0.77
Him
0.77
Activations Density 0.092%