INDEX
Explanations
mentions of specific company names and authoritative figures giving instructions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.10
0.3%
1320
+0.07
0.2%
326
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
659
+0.10
0.04
1088
+0.07
0.04
656
+0.07
0.03
Negative Logits
UREAU
-0.69
ungalow
-0.60
quæ
-0.56
Vikipedi
-0.54
etermined
-0.53
Rè
-0.53
cæ
-0.50
caufe
-0.47
SOBRE
-0.45
²(
-0.45
POSITIVE LOGITS
done
0.72
done
0.64
DONE
0.60
Done
0.57
doing
0.54
doing
0.54
Doing
0.53
xdrive
0.52
Doing
0.52
chapeau
0.51
Activations Density 0.239%