INDEX
    Explanations

    sentences related to legal matters and consequences

    references to potential harmful actions or conditions, particularly related to health and societal issues

    New Auto-Interp
    Negative Logits
     Carnage
    -0.73
     Skydragon
    -0.72
     Allies
    -0.72
    eers
    -0.70
     Survivors
    -0.65
    oday
    -0.65
    ESCO
    -0.64
    ccording
    -0.63
    ĸļ
    -0.62
    ebus
    -0.62
    POSITIVE LOGITS
    wu
    0.72
    |
    0.69
    nob
    0.65
     gmaxwell
    0.63
    bas
    0.62
    uno
    0.61
    hao
    0.59
     prin
    0.59
     <<
    0.57
     *
    0.56
    Act Density 0.130%

    No Known Activations