INDEX
    Explanations

    instances of violent events or accidents involving individuals

    New Auto-Interp
    Negative Logits
    ouch
    -0.17
     Assass
    -0.15
    lide
    -0.15
    _Impl
    -0.15
    ource
    -0.13
    achen
    -0.13
     quis
    -0.13
    sembled
    -0.13
    219
    -0.13
     betr
    -0.13
    POSITIVE LOGITS
     whose
    0.15
    Ðİ
    0.14
    ãģĸ
    0.14
    sez
    0.14
     alleged
    0.14
     Dit
    0.14
    LOPT
    0.14
     charged
    0.14
     self
    0.14
    TERN
    0.14
    Act Density 0.110%

    No Known Activations