INDEX
    Explanations

    references to violent acts or conflicts

    New Auto-Interp
    Negative Logits
    ality
    -0.18
    esta
    -0.17
    olu
    -0.15
    faction
    -0.15
    ally
    -0.15
    .au
    -0.15
    aled
    -0.14
    owie
    -0.14
    arity
    -0.14
    erator
    -0.14
    POSITIVE LOGITS
    ively
    0.19
    IVEN
    0.16
    kre
    0.15
    /mock
    0.15
    iveness
    0.15
    ersh
    0.15
    ademic
    0.15
    able
    0.14
    robe
    0.14
    erson
    0.14
    Act Density 0.046%

    No Known Activations