INDEX
Explanations
expressions of gratitude and appreciation
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.19
0.5%
478
+0.10
0.3%
849
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1445
+0.19
0.05
1200
+0.10
0.04
924
+0.09
0.04
Negative Logits
rodriguez
-1.18
javier
-1.17
alberto
-1.14
pamph
-1.12
fernando
-1.11
impractica
-1.11
felipe
-1.09
pymysql
-1.08
Minang
-1.06
jorge
-1.05
POSITIVE LOGITS
<bos>
0.87
WithMany
0.69
***!
0.66
BeginInit
0.66
Thank
0.64
脚注の使い方
0.60
ViewFeatures
0.59
We
0.57
GEBURTSDATUM
0.57
Thanks
0.56
Activations Density 0.278%