INDEX
Explanations
references to the word "Black" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.16
0.9%
169
+0.12
0.7%
452
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
452
+0.16
0.02
1
+0.12
0.02
506
+0.12
0.02
Negative Logits
·¸
-3.44
Ĭ
-3.26
ĺ
-3.18
´
-3.17
ĥ½
-3.16
¬
-3.12
Ĺ
-3.02
Ń
-3.01
ĭ
-2.99
Ħ
-2.90
POSITIVE LOGITS
pool
1.93
bank
1.89
strand
1.88
Berry
1.83
stown
1.67
nature
1.67
employment
1.65
doll
1.52
jack
1.51
water
1.50
Activations Density 0.040%