INDEX
Explanations
phrases indicating comparison or contrast
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.12
0.4%
453
+0.11
0.3%
1218
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
16
+0.12
0.04
1806
+0.11
0.03
3
+0.10
0.04
Negative Logits
aussitôt
-0.92
quelquefois
-0.87
soudain
-0.85
poichè
-0.85
perciò
-0.85
quoique
-0.84
volon
-0.83
Ikr
-0.83
constamment
-0.83
librement
-0.81
POSITIVE LOGITS
YOND
0.59
also
0.52
wavering
0.52
MainAxisSize
0.49
olybden
0.49
ISTRATION
0.49
importantly
0.48
함께
0.48
amp
0.47
thora
0.47
Activations Density 0.125%