INDEX
Explanations
political affiliations and characteristics such as race, religion, and legality
politically charged terms related to party affiliations and demographics
New Auto-Interp
Negative Logits
ĺħ
-0.77
stals
-0.74
redits
-0.70
confidence
-0.70
omics
-0.70
igers
-0.70
ainer
-0.69
runners
-0.68
earchers
-0.68
center
-0.67
POSITIVE LOGITS
sounding
1.00
isable
0.94
izable
0.83
istic
0.78
ifiable
0.76
idious
0.73
nature
0.73
manner
0.73
crowds
0.71
offspring
0.69
Activations Density 0.303%