INDEX
    Explanations

    possessive pronouns

    New Auto-Interp
    Negative Logits
    .over
    -0.07
     있어
    -0.07
     engaged
    -0.07
    .add
    -0.07
    .games
    -0.07
    bread
    -0.06
     playa
    -0.06
     Gust
    -0.06
     kỹ
    -0.06
     Sek
    -0.06
    POSITIVE LOGITS
     ".
    0.07
    419
    0.07
    "';↵
    0.07
    ностью
    0.07
    joint
    0.06
    pkg
    0.06
    _BOOLEAN
    0.06
    0.06
     ":"
    0.06
    рует
    0.06
    Act Density 0.024%

    No Known Activations