INDEX
    Explanations

    social media references and engagement metrics

    attends to the first token of a person's name from a pronoun or other mention of the person later in the sequence.

    New Auto-Interp
    Head Attr Weights
    0:0.07
    1:0.02
    2:0.06
    3:0.05
    4:0.06
    5:0.05
    6:0.18
    7:0.06
    8:0.08
    9:0.26
    10:0.03
    11:0.04
    Negative Logits
     Wyr
    -4.05
     Nib
    -3.63
     Scar
    -3.47
     Scrolls
    -3.46
     McF
    -3.38
    ritch
    -3.37
    TEXTURE
    -3.29
    tes
    -3.18
    Sky
    -3.18
     irrad
    -3.17
    POSITIVE LOGITS
     Joan
    9.68
    Jo
    4.41
     Lisbon
    4.18
     Fran
    3.88
     Manit
    3.85
     Jeanne
    3.78
     Henri
    3.74
     Jo
    3.74
     Peggy
    3.64
     Margaret
    3.63
    Act Density 0.000%

    No Known Activations