INDEX
Explanations
references to government administration, particularly the Trump administration
New Auto-Interp
Negative Logits
oth
-0.15
åĢij
-0.15
pec
-0.15
å°ĺ
-0.15
astics
-0.14
loth
-0.14
-Based
-0.14
ÑĤаб
-0.14
eed
-0.14
UTURE
-0.14
POSITIVE LOGITS
Interr
0.16
inea
0.15
Ã¥r
0.15
comb
0.15
eware
0.14
ships
0.14
czy
0.14
кÑĤа
0.14
_DIP
0.14
Dir
0.13
Activations Density 0.025%