INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Xia
    -0.08
     goto
    -0.07
     scan
    -0.07
     begr
    -0.06
    írk
    -0.06
     '::
    -0.06
    40
    -0.06
     Gat
    -0.06
    020
    -0.06
     após
    -0.06
    POSITIVE LOGITS
     자연
    0.06
    hit
    0.06
    -treated
    0.06
     náro
    0.06
     стала
    0.06
    ané
    0.06
    argar
    0.06
    redd
    0.06
    ö
    0.06
     contrace
    0.06
    Act Density 0.015%

    No Known Activations