INDEX
    Explanations

    stopping or pausing actions

    New Auto-Interp
    Negative Logits
     distinta
    0.89
     Different
    0.89
     വ്യത്യസ്ത
    0.86
     different
    0.86
     distinto
    0.86
    的不同
    0.85
     différente
    0.84
     अनेक
    0.82
     разные
    0.82
     diferentes
    0.81
    POSITIVE LOGITS
     altogether
    1.50
     completely
    1.30
     unnecessary
    1.26
     offending
    1.18
     entirely
    1.16
     unwanted
    1.15
    1.13
     Completely
    1.11
     further
    1.11
     helt
    1.07
    Act Density 0.169%

    No Known Activations