INDEX
    Explanations

    pronouns and their connections to actions or relationships

    New Auto-Interp
    Negative Logits
     certainly
    -0.18
    iek
    -0.17
    oret
    -0.16
    avor
    -0.15
    缸å½ĵ
    -0.15
    illis
    -0.15
    mise
    -0.14
     kin
    -0.14
    rets
    -0.13
    ì¼ĵ
    -0.13
    POSITIVE LOGITS
     bother
    0.27
     chose
    0.27
     suddenly
    0.26
     chosen
    0.25
     such
    0.24
     choose
    0.24
     à¤ĩतन
    0.23
     so
    0.23
     bothering
    0.22
    chosen
    0.22
    Act Density 0.215%

    No Known Activations