INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     മികച്ച
    0.46
    帮你
    0.43
     bättre
    0.42
    0.41
     лучше
    0.40
     ఏర్
    0.40
     บวก
    0.40
    ൊരു
    0.39
     بہتر
    0.39
     አማ
    0.39
    POSITIVE LOGITS
     diminished
    1.07
    失去了
    1.06
     decreased
    1.00
     reduced
    0.96
     loses
    0.96
    失去
    0.91
     perder
    0.88
     lose
    0.87
     переста
    0.86
     mất
    0.84
    Act Density 0.301%

    No Known Activations