INDEX
Explanations
references to social and political groups, particularly focusing on issues affecting various communities
New Auto-Interp
Negative Logits
lessly
-0.22
ings
-0.19
adoo
-0.17
usc
-0.15
TURE
-0.15
ful
-0.14
lessness
-0.14
oner
-0.13
less
-0.13
ively
-0.13
POSITIVE LOGITS
-American
0.23
-Americans
0.20
/Linux
0.19
-Benz
0.19
/OR
0.19
.gov
0.18
/AIDS
0.18
/mac
0.17
berger
0.16
/MIT
0.16
Activations Density 0.289%