INDEX
Explanations
elements related to political positions or roles
New Auto-Interp
Negative Logits
ateria
-0.08
_ISO
-0.08
oog
-0.08
/board
-0.08
erguson
-0.08
ÙĤب
-0.08
strr
-0.08
mÄĽr
-0.07
士
-0.07
nip
-0.07
POSITIVE LOGITS
0.07
ca
0.06
ca
0.06
Committee
0.06
serve
0.06
oor
0.05
motion
0.05
bills
0.05
lla
0.05
modern
0.05
Activations Density 0.015%