INDEX
Explanations
terms related to conversion and conversion tools
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.14
0.8%
148
+0.13
0.8%
376
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
148
+0.14
0.01
62
+0.13
0.01
474
+0.12
0.01
Negative Logits
fare
-2.10
iento
-1.79
hood
-1.67
lies
-1.60
else
-1.58
lessness
-1.58
rens
-1.50
kill
-1.45
friends
-1.45
dissection
-1.45
POSITIVE LOGITS
itive
1.85
esium
1.70
herent
1.70
ivated
1.60
atable
1.60
thereto
1.58
ipher
1.49
ational
1.47
elesc
1.44
converts
1.44
Activations Density 0.125%