INDEX
Explanations
phrases related to depth or intensity
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1413
+0.13
0.5%
871
+0.13
0.4%
687
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1413
+0.13
0.03
687
+0.13
0.03
869
+0.12
0.03
Negative Logits
Slag
-0.56
encomp
-0.54
apprehen
-0.54
affor
-0.52
attemp
-0.49
Böh
-0.49
Blat
-0.47
erad
-0.46
crus
-0.46
osu
-0.46
POSITIVE LOGITS
deep
1.14
Deep
1.10
Deep
1.10
deep
1.09
DEEP
1.01
depths
0.96
DEEP
0.95
depth
0.94
depth
0.93
deeper
0.92
Activations Density 0.102%