INDEX
Explanations
references to the White House and its activities or context
New Auto-Interp
Negative Logits
nt
-0.16
AMES
-0.16
name
-0.15
athy
-0.15
nd
-0.15
nf
-0.15
uber
-0.14
size
-0.14
server
-0.14
uet
-0.14
POSITIVE LOGITS
hall
0.24
-collar
0.22
caps
0.21
legg
0.21
aker
0.20
hurst
0.20
-hot
0.19
Plains
0.19
haven
0.19
head
0.19
Activations Density 0.013%