INDEX
    Explanations

    references to injuries and accidents

    references to cultural or artistic critiques

    New Auto-Interp
    Negative Logits
    [/
    -0.82
    shown
    -0.82
     utilizing
    -0.74
     comprised
    -0.73
    ¶ħ
    -0.72
     prior
    -0.72
     âμ
    -0.71
    foreseen
    -0.69
     approximately
    -0.68
    approximately
    -0.68
    POSITIVE LOGITS
    eds
    0.81
     quarrel
    0.70
     medicines
    0.70
     bribe
    0.68
     supermarkets
    0.67
    Newsletter
    0.67
     starve
    0.66
     complains
    0.65
     beware
    0.65
     piety
    0.63
    Act Density 1.637%

    No Known Activations