INDEX
Explanations
pronouns referring to people or things previously mentioned
the pronoun "they" and its variations
New Auto-Interp
Negative Logits
Eleven
-0.72
CCC
-0.70
thinking
-0.67
Kinn
-0.65
Brother
-0.64
Reply
-0.64
DAY
-0.63
nl
-0.63
Lansing
-0.61
Bucks
-0.60
POSITIVE LOGITS
're
1.46
are
1.13
've
1.08
were
1.05
originate
1.04
contain
1.03
belong
1.02
represent
1.00
exist
0.99
originated
0.98
Activations Density 0.172%