INDEX
    Explanations

    phrases indicating contrast or opposition

    New Auto-Interp
    Negative Logits
    '
    -0.69
    WA
    -0.55
    ;
    -0.54
    zeiro
    -0.52
    X
    -0.51
     kem
    -0.51
    vov
    -0.51
    Ge
    -0.51
    ћа
    -0.51
    ibm
    -0.50
    POSITIVE LOGITS
    ostante
    1.88
     despite
    1.55
    despite
    1.48
     Trotz
    1.46
     Despite
    1.45
    Despite
    1.41
     nonostante
    1.41
    withstanding
    1.36
     Malgré
    1.34
     spite
    1.33
    Act Density 0.092%

    No Known Activations