INDEX
    Explanations

    references to violent or aggressive actions

    New Auto-Interp
    Negative Logits
    ality
    -0.16
    olu
    -0.16
    arity
    -0.15
    âķIJâķIJ
    -0.15
    .au
    -0.15
    ãģĦãĤĭ
    -0.15
    verty
    -0.14
    erva
    -0.14
    ding
    -0.14
    ally
    -0.14
    POSITIVE LOGITS
    ively
    0.17
    insky
    0.16
    次æķ°
    0.15
    iveness
    0.15
    InProgress
    0.15
    orney
    0.15
    &T
    0.14
    able
    0.14
    ilent
    0.14
    ademic
    0.14
    Act Density 0.048%

    No Known Activations