INDEX
    Explanations

    foreign contrast words

    New Auto-Interp
    Negative Logits
     According
    1.77
     Unlike
    1.66
     Unfortunately
    1.63
     Sadly
    1.63
     Ultimately
    1.62
     Since
    1.61
     Notably
    1.59
     Importantly
    1.59
     Perhaps
    1.58
     Among
    1.57
    POSITIVE LOGITS
     dennoch
    0.95
    этому
    0.86
    0.84
     nonetheless
    0.84
    nên
    0.84
    তবু
    0.83
     trotzdem
    0.83
     tamen
    0.83
    ยัง
    0.82
     그래도
    0.81
    Act Density 0.028%

    No Known Activations