INDEX
Explanations
percentage values and their context in data discussions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
186
+0.15
0.9%
288
+0.15
0.8%
161
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
187
+0.15
0.07
288
+0.15
0.13
7
+0.11
-0.01
Negative Logits
Ĵ
-2.01
Ĥ
-1.98
¸
-1.70
·¸
-1.69
Ľ
-1.65
»
-1.62
iendo
-1.61
Ĭ
-1.59
Ģ
-1.58
ág
-1.56
POSITIVE LOGITS
à¯ģ
1.59
pired
1.50
?’
1.45
ி
1.42
?'
1.41
gran
1.37
á¼
1.37
loyalty
1.31
????
1.30
?
1.27
Activations Density 4.453%