INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    irnya
    0.44
     Novi
    0.43
     حوزه
    0.43
     Lạt
    0.42
     напрямую
    0.42
     ഇഷ്ട
    0.42
    Gosudarstvennyj
    0.41
     പിറ
    0.41
     Starbucks
    0.41
    市场的
    0.41
    POSITIVE LOGITS
     of
    0.52
    ,
    0.51
     when
    0.51
     while
    0.45
     apprehensive
    0.44
    0.43
    م
    0.43
     even
    0.42
     जब
    0.42
    ю
    0.42
    Act Density 0.004%

    No Known Activations