INDEX
Explanations
proper nouns and email addresses
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.17
1.1%
554
+0.11
0.6%
397
+0.08
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
484
+0.17
0.03
554
+0.11
0.03
240
+0.08
0.03
Negative Logits
<bos>
-3.60
ⓧ
-0.87
AssemblyCompany
-0.77
/***
-0.72
addComponent
-0.71
قایناقلار
-0.70
protected
-0.69
HasIndex
-0.69
EndProject
-0.67
šech
-0.67
POSITIVE LOGITS
affor
2.08
impra
2.08
increa
2.02
maneu
1.92
volunte
1.88
reluct
1.82
stockholm
1.81
unlaw
1.80
inev
1.79
Keny
1.78
Activations Density 0.053%