INDEX
Explanations
collective pronouns indicating inclusivity and shared experience
New Auto-Interp
Negative Logits
themselves
-0.21
re
-0.18
d
-0.17
nya
-0.17
ne
-0.17
was
-0.17
m
-0.17
noon
-0.16
w
-0.15
ctor
-0.15
POSITIVE LOGITS
ourselves
0.45
all
0.31
athers
0.30
aves
0.30
eping
0.29
brtc
0.28
blink
0.27
eding
0.27
asel
0.26
aved
0.26
Activations Density 0.458%