INDEX
Explanations
references to software tools and their functionalities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.20
1.2%
295
+0.14
0.8%
184
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
295
+0.20
0.02
184
+0.14
0.03
349
+0.13
0.02
Negative Logits
á̝
-2.18
áĢº
-2.16
ureus
-1.80
vow
-1.61
á̬
-1.55
ãģ¾ãģĽ
-1.54
woke
-1.51
ität
-1.49
births
-1.45
ahoma
-1.45
POSITIVE LOGITS
tip
2.56
kit
2.46
set
2.35
maker
2.35
bars
2.14
makers
2.01
sets
1.99
chain
1.97
box
1.94
nos
1.91
Activations Density 0.160%