INDEX
    Explanations

    words related to weakness or vulnerability

    references to weakness or frailty

    New Auto-Interp
    Negative Logits
    ICAN
    -0.81
    APH
    -0.75
    agher
    -0.74
    andise
    -0.71
    alogue
    -0.71
     Noir
    -0.69
     Sloan
    -0.68
     Andromeda
    -0.68
     McCann
    -0.68
    ICA
    -0.67
    POSITIVE LOGITS
    nesses
    1.27
    lings
    1.18
    ling
    0.99
    ening
    0.92
    ens
    0.91
    ener
    0.87
    ened
    0.86
    est
    0.85
    les
    0.84
     minded
    0.82
    Act Density 0.013%

    No Known Activations