INDEX
    Explanations

    phrases related to critiques or evaluations

    New Auto-Interp
    Negative Logits
    ngth
    -0.75
    lished
    -0.70
    thood
    -0.67
    apons
    -0.64
    interrupted
    -0.64
    successfully
    -0.63
    iencies
    -0.63
    enaries
    -0.62
    reys
    -0.61
    gang
    -0.61
    POSITIVE LOGITS
     considering
    1.15
     huh
    0.86
     eh
    0.76
     given
    0.74
     Canaver
    0.74
     coincidence
    0.74
     hindsight
    0.70
     hypocritical
    0.67
     omission
    0.67
     understatement
    0.67
    Act Density 2.003%

    No Known Activations