INDEX
Explanations
instructions or guidelines expressed in a direct and authoritative manner
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1937
+0.14
0.5%
204
+0.14
0.5%
897
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
204
+0.14
0.05
2030
+0.14
0.04
1937
+0.13
0.06
Negative Logits
kontrol
-0.61
WC
-0.55
NC
-0.55
groups
-0.55
AC
-0.54
impact
-0.54
group
-0.54
OC
-0.53
focused
-0.53
Ca
-0.53
POSITIVE LOGITS
disreg
1.44
sergio
1.43
peppa
1.40
jorge
1.37
milf
1.37
ricardo
1.36
javier
1.35
alberto
1.34
roberto
1.33
felipe
1.33
Activations Density 0.761%