INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     اعتماد
    -0.08
     کودکان
    -0.07
     polynomial
    -0.07
    _APPRO
    -0.06
     оз
    -0.06
     способом
    -0.06
     здоров
    -0.06
     Frankfurt
    -0.06
     بسیاری
    -0.06
    .getOrder
    -0.06
    POSITIVE LOGITS
    หล
    0.07
    ласти
    0.07
     wäh
    0.06
    	audio
    0.06
     Exodus
    0.06
     ble
    0.06
    scaled
    0.06
    âk
    0.06
    うち
    0.06
    раб
    0.06
    Act Density 0.003%

    No Known Activations