INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     بها
    -0.07
     Muslim
    -0.07
    ,false
    -0.07
     konuştu
    -0.07
     Earth
    -0.07
     Susp
    -0.07
    保持
    -0.07
    스토
    -0.07
     Uygu
    -0.06
     iktidar
    -0.06
    POSITIVE LOGITS
    mpp
    0.06
    23
    0.06
    mare
    0.06
    0.06
    _skip
    0.05
     pri
    0.05
    gtest
    0.05
     MILL
    0.05
    _fre
    0.05
    -role
    0.05
    Act Density 0.002%

    No Known Activations