INDEX
    Explanations

    phrases related to violent incidents or accidents involving individuals

    actions or events involving explosions or acts of violence

    New Auto-Interp
    Negative Logits
     equivalents
    -0.72
    âĨij
    -0.68
     blah
    -0.66
     anymore
    -0.62
    âĶĢâĶĢâĶĢâĶĢ
    -0.62
     sorted
    -0.62
    =-=-=-=-
    -0.61
     squared
    -0.61
     curated
    -0.60
    equal
    -0.59
    POSITIVE LOGITS
     himself
    1.00
     explosives
    0.87
     fatally
    0.81
    edly
    0.78
    andals
    0.77
     explosive
    0.76
    angering
    0.76
     suicide
    0.75
     rampage
    0.72
     Himself
    0.71
    Act Density 0.275%

    No Known Activations