INDEX
Explanations
references to threats and risks in various contexts
New Auto-Interp
Negative Logits
disappointing
-0.14
ÑĢап
-0.13
disappointed
-0.13
IDI
-0.12
disappoint
-0.12
زÙĩ
-0.12
rozh
-0.12
illy
-0.12
PCR
-0.11
ç«
-0.11
POSITIVE LOGITS
threat
0.65
threats
0.60
danger
0.56
threat
0.56
Threat
0.52
-threat
0.52
dangers
0.48
threatening
0.46
menace
0.46
threaten
0.45
Activations Density 0.157%