INDEX
    Explanations

    introducing contrast though

    New Auto-Interp
    Negative Logits
     justru
    0.55
     Despite
    0.52
     Unfortunately
    0.48
    Despite
    0.48
     nawet
    0.47
    居然
    0.47
    bla
    0.46
    EVEN
    0.44
     mesmo
    0.43
     несмотря
    0.43
    POSITIVE LOGITS
     although
    1.56
     though
    1.50
    although
    1.45
     хотя
    1.38
    though
    1.33
     aunque
    1.33
     embora
    1.30
     choć
    1.30
     Хотя
    1.13
    虽然
    1.13
    Act Density 0.193%

    No Known Activations