INDEX
Explanations
names of locations, especially Washington, D.C
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
699
+0.13
0.4%
492
+0.12
0.4%
1272
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
699
+0.13
0.04
492
+0.12
0.04
144
+0.11
0.03
Negative Logits
ftu
-1.05
reft
-0.99
thut
-0.97
fto
-0.96
fep
-0.95
fte
-0.95
perfon
-0.95
nece
-0.94
leaft
-0.93
peppa
-0.92
POSITIVE LOGITS
DC
1.27
DC
1.13
Washington
1.06
Washington
0.99
dc
0.97
dc
0.96
WASHINGTON
0.87
DCs
0.84
DCs
0.80
WASHINGTON
0.78
Activations Density 0.096%