INDEX
Explanations
names of specific locations and organizations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1967
+0.18
0.6%
2034
+0.11
0.4%
382
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.18
0.06
1967
+0.11
0.04
1708
+0.10
0.04
Negative Logits
dises
-1.86
mef
-1.80
dispen
-1.73
emphat
-1.73
affez
-1.69
sappi
-1.69
fta
-1.66
applau
-1.65
lyon
-1.65
ordina
-1.65
POSITIVE LOGITS
and
0.85
for
0.83
as
0.83
during
0.82
in
0.82
which
0.82
while
0.81
to
0.80
by
0.79
with
0.79
Activations Density 0.284%