INDEX
    Explanations

    phrases or sentences indicating disagreement

    instances of disagreement or dissenting opinions

    New Auto-Interp
    Negative Logits
    amina
    -0.79
    GV
    -0.74
    oufl
    -0.69
     Roads
    -0.69
    spring
    -0.68
     Jackets
    -0.65
    maximum
    -0.64
     adrenaline
    -0.62
    Ãł
    -0.62
    uxe
    -0.61
    POSITIVE LOGITS
    rences
    0.88
    ially
    0.85
     vehemently
    0.82
     disagree
    0.81
    uously
    0.78
    edIn
    0.78
    lihood
    0.76
    llah
    0.71
    atively
    0.71
    uous
    0.71
    Act Density 0.026%

    No Known Activations