INDEX
    Explanations

    phrases related to negative outcomes or shortcomings

    instances of the word "failure."

    New Auto-Interp
    Negative Logits
    selves
    -0.80
    enfranch
    -0.70
    rete
    -0.70
    utra
    -0.68
    esthetic
    -0.65
    estamp
    -0.64
    Ec
    -0.64
    arbon
    -0.62
    orgetown
    -0.62
    ocard
    -0.61
    POSITIVE LOGITS
     miser
    1.08
     failures
    0.87
    DEV
    0.82
    Failure
    0.81
     failure
    0.81
    ulence
    0.74
     rate
    0.73
    istence
    0.72
    luster
    0.72
     Failure
    0.71
    Act Density 0.024%

    No Known Activations