INDEX
    Explanations

    phrases emphasizing contrast or preference

    phrases that express negation or contrastive ideas

    New Auto-Interp
    Negative Logits
     Norn
    -0.68
    nec
    -0.67
    omatic
    -0.64
    CLASSIFIED
    -0.64
     clearance
    -0.63
    CF
    -0.63
    ced
    -0.62
    }}}
    -0.62
    cia
    -0.62
     Grounds
    -0.61
    POSITIVE LOGITS
     reinvent
    0.93
     rely
    0.87
     blindly
    0.86
     succumb
    0.85
     relying
    0.84
     necessarily
    0.84
     simply
    0.83
     merely
    0.83
     speculate
    0.81
    rahim
    0.78
    Act Density 0.156%

    No Known Activations