INDEX
Explanations
phrases that indicate potential threats or dangers
posed danger or threat
New Auto-Interp
Negative Logits
thâu
-0.47
Band
-0.43
disambiguazione
-0.41
ANNES
-0.41
démocr
-0.40
bandoulière
-0.40
Interpre
-0.40
indro
-0.40
judiciaire
-0.40
껏
-0.39
POSITIVE LOGITS
threat
1.10
threats
1.02
threat
1.00
Threat
0.98
danger
0.98
Threat
0.96
Threats
0.93
amenaza
0.90
Gefahr
0.87
Threats
0.87
Activations Density 0.066%