INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     lacking
    0.37
    धाना
    0.37
    缺乏
    0.37
     Willing
    0.37
     manca
    0.37
     faltando
    0.35
     உண்மைய
    0.35
     부족
    0.35
     throughput
    0.35
     трудно
    0.35
    POSITIVE LOGITS
     jangan
    2.08
    不要
    1.91
     đừng
    1.78
     Jangan
    1.75
    Jangan
    1.73
     不要
    1.69
    jangan
    1.61
    อย่า
    1.56
     อย่า
    1.54
     never
    1.47
    Act Density 0.045%

    No Known Activations