INDEX
Explanations
mentions of homophobic or anti-gay language, actions, or sentiments
references to anti-gay sentiments and legislation
New Auto-Interp
Negative Logits
Dur
-0.70
Chiefs
-0.70
Grateful
-0.69
Bei
-0.67
Planes
-0.65
Solitaire
-0.64
externalActionCode
-0.63
CentOS
-0.63
Nap
-0.62
ç«
-0.61
POSITIVE LOGITS
osexual
0.87
zilla
0.86
tons
0.83
nor
0.82
ega
0.81
erness
0.80
cation
0.79
gay
0.78
holes
0.75
ther
0.75
Activations Density 0.012%