INDEX
Explanations
social media references and engagement metrics
attends to the first token of a person's name from a pronoun or other mention of the person later in the sequence.
New Auto-Interp
Head Attr Weights
0:0.07
1:0.02
2:0.06
3:0.05
4:0.06
5:0.05
6:0.18
7:0.06
8:0.08
9:0.26
10:0.03
11:0.04
Negative Logits
Wyr
-4.05
Nib
-3.63
Scar
-3.47
Scrolls
-3.46
McF
-3.38
ritch
-3.37
TEXTURE
-3.29
tes
-3.18
Sky
-3.18
irrad
-3.17
POSITIVE LOGITS
Joan
9.68
Jo
4.41
Lisbon
4.18
Fran
3.88
Manit
3.85
Jeanne
3.78
Henri
3.74
Jo
3.74
Peggy
3.64
Margaret
3.63
Activations Density 0.000%