INDEX
    Explanations

    sentences related to consequences of actions or decisions, particularly with a focus on potential severe outcomes

    New Auto-Interp
    Negative Logits
     antik
    -1.00
     alkoh
    -0.97
     fers
    -0.92
     plak
    -0.89
     meis
    -0.89
     elek
    -0.88
     silikon
    -0.87
     kram
    -0.86
     ché
    -0.85
     lele
    -0.84
    POSITIVE LOGITS
     fatalities
    0.79
     death
    0.76
     deaths
    0.76
     fatality
    0.74
     harm
    0.69
     irreversible
    0.65
     tragedy
    0.65
     bloodshed
    0.61
     destruction
    0.60
     tragic
    0.59
    Act Density 0.618%

    No Known Activations