INDEX
    Explanations

    terms associated with justification and self-defense in conflict scenarios

    Threat, danger, or potential harm

    New Auto-Interp
    Negative Logits
     <>",
    -0.74
     ModelExpression
    -0.65
    ]--;
    -0.65
    igraphic
    -0.61
     Thebes
    -0.60
    CommonModule
    -0.60
    צלחה
    -0.58
    ativität
    -0.57
    ulite
    -0.55
     WebDriverWait
    -0.54
    POSITIVE LOGITS
     threat
    0.81
     threatening
    0.77
     harmless
    0.76
    Geplaatst
    0.73
    Threat
    0.70
     disarm
    0.69
     Threat
    0.69
     menacing
    0.69
     danger
    0.69
     threats
    0.69
    Act Density 0.292%

    No Known Activations