INDEX
    Explanations

    mentions of the word "her" in various contexts

    New Auto-Interp
    Negative Logits
    e
    -0.15
    acent
    -0.15
    amax
    -0.15
    ering
    -0.15
    ampo
    -0.14
    erto
    -0.14
    ahead
    -0.14
    volt
    -0.14
    annya
    -0.14
    ington
    -0.13
    POSITIVE LOGITS
    /us
    0.22
    cury
    0.18
    zelf
    0.17
    à¹īà¸ĩ
    0.16
    /her
    0.16
     yapmaya
    0.15
    presence
    0.14
    isers
    0.14
    à¥ģल
    0.14
    ATUS
    0.13
    Act Density 0.066%

    No Known Activations