INDEX
Explanations
phrases indicating risks or hazards related to health and safety
New Auto-Interp
Negative Logits
جÙĪ
-0.17
isoft
-0.16
omap
-0.16
irty
-0.16
-à¤ħ
-0.15
acades
-0.15
reservation
-0.15
');?>"
-0.15
iets
-0.15
shr
-0.14
POSITIVE LOGITS
pose
0.23
idon
0.23
problems
0.23
challenges
0.21
questions
0.21
serious
0.20
threat
0.19
pose
0.19
challenge
0.18
risks
0.18
Activations Density 0.013%