INDEX
Explanations
plural pronouns, particularly referencing people or subjects in a discussion
New Auto-Interp
Negative Logits
itself
-0.28
their
-0.17
ause
-0.17
коÑĤоÑĢое
-0.16
otland
-0.16
ä»ĸ们
-0.16
its
-0.16
its
-0.16
loro
-0.15
their
-0.15
POSITIVE LOGITS
're
0.19
chy
0.17
iner
0.17
’re
0.16
'RE
0.15
ÑĤакими
0.15
OAD
0.15
alic
0.15
kich
0.14
efore
0.14
Activations Density 0.102%