INDEX
    Explanations

    direct responses of negation or refusal

    negative responses or denials

    New Auto-Interp
    Negative Logits
    ortment
    -0.72
     reverted
    -0.65
    kefeller
    -0.63
    drawn
    -0.60
    rored
    -0.60
    sing
    -0.58
    pez
    -0.58
    ylan
    -0.57
    riers
    -0.57
    abal
    -0.57
    POSITIVE LOGITS
     sir
    0.96
     Nope
    0.91
    terday
    0.89
    !
    0.88
    !,
    0.83
    !.
    0.83
     Absolutely
    0.79
     worries
    0.78
    !!!!
    0.77
    .
    0.77
    Act Density 0.072%

    No Known Activations