INDEX
    Explanations

    words related to irreversibility or being beyond repair

    New Auto-Interp
    Negative Logits
    GD
    -0.73
     Nights
    -0.70
    ucket
    -0.67
    KER
    -0.65
    anwhile
    -0.65
    HR
    -0.64
     Butterfly
    -0.63
    OHN
    -0.62
    creen
    -0.62
    arta
    -0.62
    POSITIVE LOGITS
    voc
    1.41
    parable
    1.30
    ceivable
    1.09
    utive
    0.96
    hibited
    0.95
    serious
    0.92
    vious
    0.90
    medi
    0.89
    itable
    0.89
    itive
    0.89
    Act Density 0.099%

    No Known Activations