INDEX
Explanations
phrases or words related to unsafe situations or actions
references to safety and the concept of being unsafe
New Auto-Interp
Negative Logits
orah
-0.85
cence
-0.84
phasis
-0.81
ership
-0.80
braska
-0.80
bernatorial
-0.80
gdala
-0.80
ophers
-0.80
zzo
-0.80
thood
-0.76
POSITIVE LOGITS
unsafe
1.04
adolesc
0.78
nesses
0.72
unle
0.67
NESS
0.67
hazardous
0.66
ÃįÃį
0.62
Ukrain
0.62
unhealthy
0.61
IED
0.60
Activations Density 0.016%