INDEX
    Explanations

    confidence and certainty in statements

    New Auto-Interp
    Negative Logits
    iless
    -0.71
    thood
    -0.70
    ories
    -0.69
    psey
    -0.66
    matically
    -0.63
    cially
    -0.63
    idas
    -0.62
    vati
    -0.60
    ilaterally
    -0.60
    inth
    -0.59
    POSITIVE LOGITS
     surprises
    0.70
     Rampage
    0.70
     admire
    0.67
     plenty
    0.66
     delight
    0.65
     delighted
    0.65
    âĶĢ
    0.65
     adore
    0.63
     grinning
    0.63
     displeasure
    0.62
    Act Density 3.198%

    No Known Activations