INDEX
Explanations
instances of high numerical values or strong emphasis on certain metrics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
472
+0.12
0.7%
281
+0.12
0.6%
198
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
438
+0.12
0.40
53
+0.12
0.42
56
+0.12
0.11
Negative Logits
ettes
-2.05
etting
-1.69
ets
-1.64
aku
-1.52
ERS
-1.50
United
-1.47
une
-1.46
upcoming
-1.42
IRS
-1.42
oku
-1.40
POSITIVE LOGITS
ClCompile
2.02
consequence
1.72
ferent
1.70
neut
1.65
respect
1.61
Cause
1.54
stit
1.53
\%
1.52
care
1.50
============
1.47
Activations Density 4.670%