INDEX
Explanations
instances of words related to social issues and controversies, particularly those related to justice, power, and evidence
terms and phrases related to discomfort and societal issues
New Auto-Interp
Negative Logits
Tes
-0.73
emale
-0.64
Ultra
-0.63
Teen
-0.61
Ay
-0.60
cientious
-0.59
joining
-0.59
senal
-0.58
Seventh
-0.57
Congratulations
-0.57
POSITIVE LOGITS
ain
0.81
"?
0.77
?
0.76
refers
0.74
!?
0.74
...?
0.73
???
0.72
":["
0.72
?!
0.72
equals
0.71
Activations Density 0.677%