INDEX
Explanations
phrases related to potential harm or danger to individuals or society
concerns related to the impact of actions on public health and safety
New Auto-Interp
Negative Logits
irection
-0.65
brisk
-0.64
boast
-0.61
caveats
-0.60
bluff
-0.59
authoritative
-0.59
manoeuv
-0.55
flurry
-0.55
motions
-0.54
suggestive
-0.54
POSITIVE LOGITS
financially
0.94
unborn
0.86
ankind
0.85
downstream
0.83
wellbeing
0.83
economically
0.82
elector
0.81
harmed
0.81
livelihood
0.79
outwe
0.79
Activations Density 0.380%