INDEX
    Explanations

    personal pronouns

    New Auto-Interp
    Negative Logits
    Problem
    -0.07
     конферен
    -0.07
     Resources
    -0.07
     firefox
    -0.06
    _patient
    -0.06
    _press
    -0.06
    goals
    -0.06
    TypeID
    -0.06
     miktar
    -0.06
     YouTube
    -0.06
    POSITIVE LOGITS
     ند
    0.07
    nard
    0.06
    _ul
    0.06
    heroes
    0.06
    ajan
    0.06
    0.06
     cousins
    0.06
     ()->
    0.06
    ()["
    0.05
     qualifier
    0.05
    Act Density 0.055%

    No Known Activations