INDEX
Explanations
phrases related to hazardous situations and potential injuries
New Auto-Interp
Negative Logits
污
-0.15
Äįan
-0.15
ãĥ¼ãĥģ
-0.15
Leaks
-0.15
Affected
-0.14
ãĥ¼ãĤ¿
-0.14
ipples
-0.14
ekim
-0.14
tml
-0.14
),$
-0.14
POSITIVE LOGITS
danger
0.78
dangerous
0.73
dangers
0.71
danger
0.66
Danger
0.65
-danger
0.62
Dangerous
0.60
Danger
0.60
hazardous
0.60
hazard
0.58
Activations Density 0.359%