INDEX
Explanations
pronouns 'they' paired with verbs indicating actions or states
repeated references to the pronoun "they."
New Auto-Interp
Negative Logits
Eps
-0.77
Cumber
-0.70
Courier
-0.66
Vine
-0.63
Hunting
-0.62
Cobb
-0.62
Burg
-0.62
Wol
-0.62
Phi
-0.62
Vog
-0.62
POSITIVE LOGITS
've
1.18
'd
1.16
're
1.09
selves
0.97
zbollah
0.88
appre
0.88
'll
0.88
tasted
0.87
encount
0.87
disapprove
0.85
Activations Density 0.078%