INDEX
Explanations
specific items or entities mentioned in a list
references to social and political issues involving marginalization or exclusion
New Auto-Interp
Negative Logits
ãĥį
-0.85
exclusive
-0.69
\/\/
-0.65
Beg
-0.63
ym
-0.60
ãĥ©ãĥ³
-0.60
HuffPost
-0.60
ãĥ«
-0.60
Ö¼
-0.60
Piper
-0.59
POSITIVE LOGITS
etc
1.62
blah
1.08
â̦)
1.07
etc
1.02
...)
1.02
â̦
0.95
,...
0.89
â̦
0.88
â̦.
0.85
ect
0.83
Activations Density 0.399%