INDEX
    Explanations

    mentions of people, particularly in social contexts and interactions

    New Auto-Interp
    Negative Logits
    Ñĥг
    -0.16
    queda
    -0.15
     flown
    -0.15
     Loved
    -0.14
     risen
    -0.14
    itten
    -0.14
    aret
    -0.14
    ád
    -0.14
    uchen
    -0.14
    IVEN
    -0.14
    POSITIVE LOGITS
     was
    0.33
    was
    0.28
    _was
    0.28
     Was
    0.25
    Was
    0.25
     were
    0.25
     did
    0.25
     wasn
    0.24
     saw
    0.23
     yesterday
    0.21
    Act Density 0.878%

    No Known Activations