INDEX
    Explanations

    expressions of disagreement

    New Auto-Interp
    Negative Logits
     ?...
    -0.82
     !...
    -0.81
     Kün
    -0.79
     emphat
    -0.78
     Simult
    -0.77
     impractica
    -0.76
     unlaw
    -0.75
     Fasc
    -0.75
     effe
    -0.71
     fuf
    -0.69
    POSITIVE LOGITS
     disagree
    1.11
     disagrees
    0.91
     agree
    0.84
     disagreed
    0.82
     disagreement
    0.77
     agrees
    0.76
     agreement
    0.75
    Agree
    0.74
    agree
    0.69
     Disagree
    0.68
    Act Density 0.084%

    No Known Activations