INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ЮЛ
    -0.07
    -0.07
    ulet
    -0.06
    ients
    -0.06
     जन
    -0.06
    _NEG
    -0.06
    ělí
    -0.06
     mužů
    -0.06
     formulario
    -0.06
    θρω
    -0.06
    POSITIVE LOGITS
     Read
    0.07
     Inter
    0.07
     Turkish
    0.06
     subt
    0.06
    bins
    0.06
     le
    0.06
     Fundamental
    0.06
     radi
    0.06
     Indonesian
    0.06
    _container
    0.06
    Act Density 0.001%

    No Known Activations