INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     justification
    -0.08
     सच
    -0.08
    မြ
    -0.07
    -0.07
     upbringing
    -0.07
     Adolesc
    -0.07
    Tabbed
    -0.07
    volt
    -0.07
    -0.07
    percent
    -0.07
    POSITIVE LOGITS
     крыш
    0.08
    Planes
    0.08
     بين
    0.08
     SES
    0.08
    _plane
    0.08
     лини
    0.08
     comuni
    0.08
     überhaupt
    0.08
     iki
    0.07
    0.07
    Act Density 0.014%

    No Known Activations