INDEX
    Explanations

    tokens that refer to individuals or personal identities

    New Auto-Interp
    Negative Logits
    agar
    -0.16
    ts
    -0.16
    usted
    -0.15
    logan
    -0.15
    tee
    -0.15
     Individuals
    -0.15
    -ie
    -0.15
    tees
    -0.14
    iais
    -0.14
    gars
    -0.14
    POSITIVE LOGITS
    nal
    0.33
    nel
    0.31
    ality
    0.29
    ified
    0.27
    ajes
    0.25
    aggi
    0.25
    ae
    0.24
    nell
    0.24
    nels
    0.24
    ifying
    0.23
    Act Density 0.007%

    No Known Activations