INDEX
Explanations
pronouns related to group interactions
occurrences of the word "they" and its variations
New Auto-Interp
Negative Logits
fect
-0.71
THEM
-0.67
them
-0.64
âĺħ
-0.62
ones
-0.62
rior
-0.61
Monster
-0.61
fecture
-0.59
obia
-0.57
Domain
-0.56
POSITIVE LOGITS
selves
0.92
atically
0.78
're
0.77
wills
0.75
edin
0.70
eds
0.69
essor
0.67
itri
0.67
éĹĺ
0.67
selves
0.66
Activations Density 0.331%