INDEX
Explanations
references to political parties
mentions of political parties
New Auto-Interp
Negative Logits
Tile
-0.73
Wheat
-0.73
Drake
-0.70
angelo
-0.69
DOI
-0.69
alam
-0.68
Deter
-0.68
htaking
-0.66
Monroe
-0.65
Ridge
-0.64
POSITIVE LOGITS
affiliation
0.93
leader
0.91
Leader
0.87
goers
0.83
insiders
0.82
leader
0.82
affili
0.80
agogue
0.80
advoc
0.79
leaders
0.79
Activations Density 0.032%