INDEX
Explanations
references to the United States
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
348
+0.13
0.8%
60
+0.11
0.6%
362
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
348
+0.13
0.03
208
+0.11
0.02
78
+0.10
0.01
Negative Logits
complete
-1.96
true
-1.88
mean
-1.86
immediate
-1.64
respectively
-1.57
awning
-1.56
truly
-1.56
late
-1.50
empty
-1.50
normal
-1.49
POSITIVE LOGITS
cript
2.23
ÅĽci
1.88
creen
1.86
enos
1.86
bases
1.73
ygen
1.71
ière
1.66
idelines
1.65
arsenal
1.65
volt
1.65
Activations Density 4.841%