INDEX
    Explanations

    contrasting traditional with new

    New Auto-Interp
    Negative Logits
     nejen
    0.66
    不仅仅
    0.58
     bukan
    0.57
     Bukan
    0.55
    ไม่ใช่
    0.55
    并非
    0.54
     вместо
    0.54
     instead
    0.54
    instead
    0.48
     unusually
    0.48
    POSITIVE LOGITS
     comparatively
    0.70
     relativamente
    0.66
     Relatively
    0.65
    relatively
    0.63
     relatively
    0.62
     relativement
    0.55
     whereas
    0.54
    比較的
    0.54
     digamos
    0.52
     মোটামুটি
    0.50
    Act Density 0.095%

    No Known Activations