INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     μο
    -0.07
     پاد
    -0.06
     зов
    -0.06
    ahtar
    -0.06
    @@@@
    -0.06
    _look
    -0.06
    ubes
    -0.06
    ábado
    -0.06
    _loan
    -0.06
    .hasClass
    -0.06
    POSITIVE LOGITS
     ev
    0.07
     bartender
    0.07
     according
    0.07
     resistant
    0.06
     prices
    0.06
     expensive
    0.06
    Forecast
    0.06
     reluctance
    0.06
     کند
    0.06
    .ReLU
    0.06
    Act Density 0.001%

    No Known Activations