INDEX
Explanations
descriptions of negative societal attitudes towards homosexuality
New Auto-Interp
Negative Logits
ramid
-0.63
si
-0.62
akening
-0.59
ulia
-0.59
raviolet
-0.59
ndum
-0.59
gradation
-0.58
rame
-0.57
ournal
-0.57
acus
-0.57
POSITIVE LOGITS
surprises
0.89
interesting
0.70
fresh
0.68
colorful
0.67
goodies
0.66
fascinating
0.66
curiosity
0.65
calories
0.65
holes
0.63
colourful
0.63
Activations Density 17.210%