INDEX
    Explanations

    phrases indicating contrasting or contradictory statements

    affirmations or expressions of agreement

    New Auto-Interp
    Negative Logits
    ]:
    -0.79
    lier
    -0.76
    ESE
    -0.72
    interstitial
    -0.71
    uten
    -0.69
    Connector
    -0.69
    backer
    -0.69
    ]).
    -0.68
    opsis
    -0.68
    ipel
    -0.68
    POSITIVE LOGITS
     admittedly
    0.98
     technically
    0.95
     disagree
    0.89
     inconvenience
    0.79
     but
    0.77
     imperfect
    0.76
     superf
    0.75
     laud
    0.74
     initially
    0.73
     merits
    0.71
    Act Density 0.607%

    No Known Activations