INDEX
Explanations
references to plural pronouns, particularly "they."
New Auto-Interp
Negative Logits
pomo
-0.67
を取る
-0.58
Paar
-0.55
mem
-0.54
Magenta
-0.53
almaz
-0.53
Staub
-0.52
біль
-0.52
mín
-0.52
fael
-0.52
POSITIVE LOGITS
they
2.21
They
2.01
they
1.96
They
1.95
THEY
1.95
THEY
1.88
he
1.59
Their
1.39
Mereka
1.35
они
1.34
Activations Density 0.094%