INDEX
Explanations
terms related to politics
references to politics
New Auto-Interp
Negative Logits
FIG
-0.87
Item
-0.69
uin
-0.66
ii
-0.66
heim
-0.65
Figure
-0.65
Instruct
-0.65
Amount
-0.64
isode
-0.63
Additional
-0.62
POSITIVE LOGITS
politics
3.71
Politics
2.56
politics
2.53
Politics
2.23
political
1.89
political
1.80
politic
1.73
polit
1.71
politicians
1.68
Political
1.65
Activations Density 0.017%