INDEX
Explanations
references to transgender individuals and related societal attitudes
New Auto-Interp
Negative Logits
findpost
-0.57
ysuckle
-0.44
RegressionTest
-0.44
-};
-0.44
featureID
-0.43
surla
-0.43
chré
-0.43
iconLine
-0.42
cerpt
-0.42
hozz
-0.42
POSITIVE LOGITS
transgender
0.88
gender
0.71
gender
0.64
trans
0.63
Gender
0.56
Gender
0.54
trans
0.49
Trans
0.49
⚧
0.49
bisexual
0.48
Activations Density 0.109%