INDEX
    Explanations

    phrases or words that express contrast or contradiction

    New Auto-Interp
    Negative Logits
    '
    -0.62
    hassee
    -0.55
    יצד
    -0.55
    WA
    -0.54
     Chartres
    -0.53
    Ge
    -0.52
     Torino
    -0.50
    jazdu
    -0.50
    imedes
    -0.50
     Skinner
    -0.49
    POSITIVE LOGITS
    ostante
    1.82
     despite
    1.47
     Despite
    1.41
    Despite
    1.36
    despite
    1.36
     nonostante
    1.34
     Malgré
    1.34
     spite
    1.32
     Trotz
    1.29
    Trotz
    1.28
    Act Density 0.080%

    No Known Activations