INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     불편
    0.81
     පරි
    0.74
     بررسی
    0.74
     लष्
    0.73
    0.73
    ેર
    0.73
    Kontakt
    0.72
     접근
    0.72
     इंजेक्शन
    0.71
     处理
    0.70
    POSITIVE LOGITS
     winning
    3.31
     wins
    2.98
     win
    2.96
    winning
    2.82
    Winning
    2.79
     Winning
    2.77
    win
    2.37
     Wins
    2.35
     victory
    2.35
    wins
    2.32
    Act Density 0.395%

    No Known Activations