INDEX
    Explanations

    statements expressing disagreement

    expressions of disagreement

    New Auto-Interp
    Negative Logits
    amina
    -0.80
    GV
    -0.71
    Ãł
    -0.67
     Roads
    -0.65
    oufl
    -0.63
    mary
    -0.63
    maximum
    -0.63
    annis
    -0.62
    spring
    -0.62
     Roller
    -0.61
    POSITIVE LOGITS
     disagree
    0.85
    edIn
    0.83
     vehemently
    0.81
    llah
    0.78
    ingly
    0.77
    rences
    0.76
    ially
    0.75
    ively
    0.73
    lihood
    0.73
     unanimously
    0.72
    Act Density 0.027%

    No Known Activations