INDEX
Explanations
phrases where a person is talking about themselves or their actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.18
0.6%
381
+0.16
0.5%
1919
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.18
0.12
805
+0.16
0.11
1415
+0.15
0.07
Negative Logits
susun
-0.83
panahon
-0.78
epoca
-0.77
poliester
-0.77
vinil
-0.75
pietre
-0.75
burbu
-0.75
répon
-0.74
parlamento
-0.74
Campionato
-0.73
POSITIVE LOGITS
créateur
0.69
want
0.68
journalistes
0.66
spécialistes
0.65
encomp
0.65
shenan
0.65
intersper
0.65
disagre
0.64
imprimée
0.64
<_>
0.63
Activations Density 0.330%