INDEX
    Explanations

    instances where someone takes authoritative action

    New Auto-Interp
    Negative Logits
     gaily
    -0.93
     apprehen
    -0.91
     nobly
    -0.90
     vainly
    -0.89
     ineffec
    -0.89
     inconce
    -0.87
     unspeak
    -0.84
     tolerably
    -0.81
     disagre
    -0.78
     disgra
    -0.77
    POSITIVE LOGITS
     WITH
    0.69
    WITH
    0.64
    pertise
    0.63
    with
    0.61
     sentito
    0.58
    sightly
    0.58
     gusto
    0.57
     soggior
    0.56
     skimage
    0.56
    With
    0.56
    Act Density 0.218%

    No Known Activations