INDEX
    Explanations

    punctuation and quotations, indicating dialogue or cited speech

    New Auto-Interp
    Negative Logits
    ,
    -0.25
    ’s
    -0.18
    ´s
    -0.15
    :
    -0.15
     himself
    -0.14
    's
    -0.14
    `s
    -0.14
     Himself
    -0.14
     seinen
    -0.13
     lại
    -0.13
    POSITIVE LOGITS
    ÂĿ
    0.41
    said
    0.34
     according
    0.31
     said
    0.30
     she
    0.30
    she
    0.28
    according
    0.28
     he
    0.28
     added
    0.23
    ÂĢÂ
    0.23
    Act Density 0.107%

    No Known Activations