INDEX
    Explanations

    references to possessive pronouns and possessive language

    New Auto-Interp
    Negative Logits
    emd
    -0.16
    à¹Ĥà¸Ľà¸£
    -0.16
    andle
    -0.15
    oten
    -0.14
    ç¦
    -0.14
     ÐŁÑĢо
    -0.14
    ignon
    -0.13
    assel
    -0.13
    54
    -0.13
    anky
    -0.13
    POSITIVE LOGITS
    onto
    0.15
     Angels
    0.15
    enger
    0.15
     angels
    0.15
    ãĥ©ãĥ¼
    0.14
    adow
    0.14
    ãĥ¬ãĥ¼
    0.14
    ola
    0.14
    erez
    0.14
    å¢
    0.14
    Act Density 0.010%

    No Known Activations