INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    _TD
    -0.07
     alma
    -0.06
    ارج
    -0.06
    ))),
    -0.06
    ntp
    -0.06
     Plasma
    -0.06
    Verb
    -0.06
     expansions
    -0.06
    liga
    -0.06
    asal
    -0.06
    POSITIVE LOGITS
    0.06
    kan
    0.06
    ,可
    0.06
    ug
    0.06
     пораж
    0.06
    -ui
    0.06
     smarty
    0.06
     proficient
    0.06
     yellow
    0.06
    dac
    0.06
    Act Density 0.001%

    No Known Activations