INDEX
    Explanations

    instances where something has failed or been unsuccessful

    instances of the word "failed."

    New Auto-Interp
    Negative Logits
    enfranch
    -0.81
    selves
    -0.74
    til
    -0.71
    istics
    -0.71
    utra
    -0.68
    edged
    -0.66
    tip
    -0.66
    inda
    -0.65
     Layer
    -0.64
    ized
    -0.64
    POSITIVE LOGITS
     miser
    1.28
    fail
    0.92
     failures
    0.91
    DEV
    0.87
     fail
    0.85
     Failed
    0.84
     failure
    0.81
     catast
    0.81
     dism
    0.78
     horribly
    0.76
    Act Density 0.017%

    No Known Activations