INDEX
    Explanations

    instances of the word "Sure" with a strong activation

    affirmative phrases that express certainty

    New Auto-Interp
    Negative Logits
    tnc
    -0.69
    humane
    -0.68
    foreseen
    -0.66
    andom
    -0.60
    mercial
    -0.60
    mone
    -0.59
    uese
    -0.59
     resil
    -0.59
    rights
    -0.58
    utenberg
    -0.57
    POSITIVE LOGITS
    ndra
    0.99
    ty
    0.82
     enough
    0.80
    entimes
    0.78
    fire
    0.70
    terday
    0.70
    footed
    0.68
     tack
    0.67
    ties
    0.65
    tt
    0.65
    Act Density 0.017%

    No Known Activations