INDEX
Explanations
references to specific publications and their contributors
New Auto-Interp
Negative Logits
ibs
-0.18
Manhattan
-0.17
NYC
-0.16
.NewReader
-0.15
>NN
-0.15
Harlem
-0.15
adiens
-0.15
ensi
-0.14
ätt
-0.14
"user
-0.14
POSITIVE LOGITS
Washington
0.34
washington
0.33
Washington
0.30
DC
0.28
DC
0.28
dc
0.27
Redskins
0.27
ashington
0.26
ASHINGTON
0.24
Wa
0.23
Activations Density 0.016%