INDEX
Explanations
phrases related to a controversial or offensive event and the subsequent actions taken
instances of apologies and acknowledgment of mistakes
New Auto-Interp
Negative Logits
cells
-0.88
trillions
-0.83
omics
-0.82
cures
-0.81
arbon
-0.81
fert
-0.76
Recovery
-0.75
forecasts
-0.75
forecasting
-0.74
savings
-0.73
POSITIVE LOGITS
offended
1.50
offending
1.33
homophobic
1.33
objectionable
1.32
Yiannopoulos
1.28
boycott
1.26
disrespectful
1.23
sexist
1.21
slurs
1.20
discriminatory
1.18
Activations Density 1.327%