INDEX
Explanations
phrases related to posing risks or threats
phrases indicating potential threats or risks
New Auto-Interp
Negative Logits
ciples
-0.74
Manufact
-0.72
cknowled
-0.70
essen
-0.68
mith
-0.68
azines
-0.66
endars
-0.65
enson
-0.65
write
-0.65
£ı
-0.64
POSITIVE LOGITS
threat
1.31
hazard
1.27
danger
1.20
risk
1.15
risks
1.07
challenge
1.05
dangers
1.02
hazards
0.98
hurdle
0.98
peril
0.98
Activations Density 0.155%