INDEX
    Explanations

    phrases related to certainty or confirmation

    strong expressions of denial or disagreement

    New Auto-Interp
    Negative Logits
    icipated
    -0.78
     Siber
    -0.74
     bathrooms
    -0.68
     Tec
    -0.67
     restrooms
    -0.66
     Jensen
    -0.64
     downstream
    -0.61
     populated
    -0.61
     transformer
    -0.60
     exploits
    -0.60
    POSITIVE LOGITS
    Yeah
    1.11
    Yes
    1.01
    Hmm
    0.99
    Answer
    0.99
    Exactly
    0.98
    YES
    0.96
    Correct
    0.94
     sir
    0.93
    Absolutely
    0.92
     Exactly
    0.91
    Act Density 0.533%

    No Known Activations