INDEX
    Explanations

    words that are verbs

    New Auto-Interp
    Negative Logits
    encies
    -0.80
    grounds
    -0.78
    artifacts
    -0.77
    things
    -0.76
    adj
    -0.75
    igne
    -0.74
    ons
    -0.74
    tests
    -0.73
    Iss
    -0.73
    evidence
    -0.73
    POSITIVE LOGITS
     bang
    1.36
     vengeance
    1.30
     twist
    1.13
     smile
    1.08
     penchant
    1.03
     flourish
    1.03
     caveat
    1.03
     grin
    1.02
     wink
    0.98
     shrug
    0.97
    Act Density 0.139%

    No Known Activations