INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    我可以
    0.42
    یثیت
    0.39
     moods
    0.38
     energías
    0.38
    精力
    0.37
    ٗ
    0.37
    🐨
    0.37
    0.37
     magnitudes
    0.36
    үм
    0.36
    POSITIVE LOGITS
     but
    0.70
     있지만
    0.69
    지만
    0.68
    이지만
    0.64
    but
    0.63
     nhưng
    0.61
    하지만
    0.61
     लेकिन
    0.57
    但不
    0.57
    लेकिन
    0.56
    Act Density 0.077%

    No Known Activations