INDEX
    Explanations

    phrases related to agreement or disagreement

    phrases indicating disagreement or criticism

    New Auto-Interp
    Negative Logits
    ario
    -0.68
    atto
    -0.65
    iture
    -0.65
     notwithstanding
    -0.62
     Donation
    -0.61
    etermination
    -0.60
     solution
    -0.60
    leep
    -0.60
     disposed
    -0.59
    urated
    -0.59
    POSITIVE LOGITS
     those
    0.72
    those
    0.69
     these
    0.68
     our
    0.68
     us
    0.64
    them
    0.64
    their
    0.64
    Adams
    0.61
     their
    0.60
     his
    0.60
    Act Density 0.150%

    No Known Activations