INDEX
    Explanations

    instances of personal pronouns and their associated verbs or actions

    New Auto-Interp
    Negative Logits
    ega
    -0.16
    ahn
    -0.15
    kil
    -0.15
     negatives
    -0.14
    PLUS
    -0.14
    cente
    -0.14
    .plus
    -0.14
    otts
    -0.14
    Express
    -0.14
    //{{
    -0.13
    POSITIVE LOGITS
    combe
    0.15
    ylland
    0.15
     «
    0.14
     displ
    0.14
    ernet
    0.14
    eter
    0.14
    *);↵↵
    0.14
    seau
    0.14
    atorio
    0.13
    bias
    0.13
    Act Density 0.158%

    No Known Activations