INDEX
Explanations
requests for feedback or communication prompts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.10
0.3%
453
+0.09
0.2%
1005
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.10
0.02
1759
+0.09
0.03
366
+0.08
0.02
Negative Logits
javier
-0.77
magis
-0.74
Juf
-0.71
santiago
-0.70
imbal
-0.68
roberto
-0.67
alberto
-0.66
claudia
-0.66
fua
-0.66
frankfurt
-0.65
POSITIVE LOGITS
contact
0.60
SharedDtor
0.56
please
0.55
contact
0.53
NSCoder
0.51
Contact
0.51
CONTACT
0.50
please
0.49
<bos>
0.49
rivol
0.48
Activations Density 0.172%