INDEX
    Explanations

    references to unspecified individuals or groups

    New Auto-Interp
    Negative Logits
    sed
    -0.16
    ts
    -0.16
    wner
    -0.15
    aries
    -0.15
    lington
    -0.14
    ÙĤر
    -0.14
    ted
    -0.14
    endor
    -0.14
    tn
    -0.13
    uyen
    -0.13
    POSITIVE LOGITS
     who
    0.26
     else
    0.25
    hood
    0.20
    who
    0.19
    age
    0.18
    Who
    0.18
     whom
    0.17
    _else
    0.17
     Who
    0.17
    /group
    0.16
    Act Density 0.061%

    No Known Activations