INDEX
Explanations
words related to threats or harm
phrases indicating potential threats or harm to individuals or society
New Auto-Interp
Negative Logits
soDeliveryDate
-0.89
ihad
-0.71
arten
-0.70
tions
-0.67
need
-0.66
anooga
-0.63
sponsored
-0.62
iHUD
-0.61
Cosponsors
-0.60
hent
-0.60
POSITIVE LOGITS
injure
0.86
adies
0.78
ensure
0.76
compensate
0.76
avoid
0.75
asted
0.75
assist
0.74
enhance
0.74
reduce
0.74
achieve
0.73
Activations Density 0.167%