INDEX
Explanations
syntactic structures in text
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
814
+0.18
0.7%
1464
+0.13
0.5%
1387
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
878
+0.18
0.03
814
+0.13
0.02
1387
+0.13
0.02
Negative Logits
earnestness
-0.53
frivol
-0.47
corpion
-0.44
nobly
-0.44
undial
-0.42
walang
-0.41
ingrat
-0.40
impet
-0.40
Bekasi
-0.40
couts
-0.40
POSITIVE LOGITS
Sy
1.13
Sy
1.11
SYN
1.07
syn
1.06
sy
1.03
Syn
1.03
Syn
1.02
SY
0.96
sy
0.96
SYN
0.93
Activations Density 0.111%