INDEX
Explanations
terms related to various forms of discrimination or anti-group sentiments
New Auto-Interp
Negative Logits
iless
-0.15
elon
-0.15
ListOf
-0.14
antom
-0.14
egov
-0.14
ego
-0.14
Roose
-0.14
ohl
-0.13
ãĥ³ãĤ¸
-0.13
egen
-0.13
POSITIVE LOGITS
sentiment
0.35
sentiments
0.30
Sent
0.27
measures
0.23
Sent
0.23
forces
0.21
stance
0.21
Measures
0.20
sent
0.19
activity
0.19
Activations Density 0.040%