INDEX
    Explanations

    words related to failures or negative outcomes

    references to failures in various contexts

    New Auto-Interp
    Negative Logits
    rete
    -0.69
    enfranch
    -0.68
    bern
    -0.68
    selves
    -0.67
    population
    -0.66
    atu
    -0.66
    riel
    -0.65
    utra
    -0.65
    rosse
    -0.65
    irin
    -0.64
    POSITIVE LOGITS
     miser
    1.30
     dism
    0.82
    DEV
    0.81
     failures
    0.79
     catast
    0.78
    Failure
    0.78
     horribly
    0.77
    afe
    0.73
    lust
    0.72
    fail
    0.71
    Act Density 0.029%

    No Known Activations