INDEX
Explanations
mentions of political figures and their titles
references to politicians or congressional representatives
New Auto-Interp
Negative Logits
decomp
-0.68
gor
-0.66
gratification
-0.66
lessons
-0.65
dehuman
-0.65
recomp
-0.65
maturity
-0.65
relat
-0.64
equival
-0.63
factors
-0.63
POSITIVE LOGITS
)'
1.02
)]
0.93
.),
0.86
)
0.85
TX
0.84
),
0.80
iciary
0.79
NJ
0.79
aucus
0.79
wich
0.79
Activations Density 0.031%