INDEX
    Explanations

    words or parts of words containing "ew"

    New Auto-Interp
    Negative Logits
     cort
    -0.71
     retri
    -0.67
    Downloadha
    -0.66
     burg
    -0.65
     unarmed
    -0.63
     apprehend
    -0.63
     thieves
    -0.63
    REDACTED
    -0.62
     administr
    -0.61
    esthetic
    -0.61
    POSITIVE LOGITS
    estern
    1.24
    atts
    0.99
    esley
    0.98
    ew
    0.97
    een
    0.96
    sburg
    0.95
    olf
    0.93
    ild
    0.92
    eeks
    0.92
    ITNESS
    0.91
    Act Density 0.009%

    No Known Activations