INDEX
Explanations
terms associated with justification and self-defense in conflict scenarios
Threat, danger, or potential harm
threat and danger
New Auto-Interp
Negative Logits
<>",
-0.74
ModelExpression
-0.65
]--;
-0.65
igraphic
-0.61
Thebes
-0.60
CommonModule
-0.60
צלחה
-0.58
ativität
-0.57
ulite
-0.55
WebDriverWait
-0.54
POSITIVE LOGITS
threat
0.81
threatening
0.77
harmless
0.76
Geplaatst
0.73
Threat
0.70
disarm
0.69
Threat
0.69
menacing
0.69
danger
0.69
threats
0.69
Activations Density 0.292%