INDEX
Explanations
descriptions of actions or events in a narrative context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
382
+0.20
0.6%
1265
+0.12
0.4%
2019
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.20
0.06
1265
+0.12
0.05
1413
+0.11
0.03
Negative Logits
inev
-1.41
maneu
-1.38
disagre
-1.37
reluct
-1.36
depic
-1.34
kasa
-1.34
abstrait
-1.34
suscep
-1.33
encomp
-1.32
indestru
-1.32
POSITIVE LOGITS
albeit
0.68
yet
0.63
vibrant
0.62
reliable
0.62
efficient
0.60
Reparto
0.59
لكن
0.59
Même
0.58
Edición
0.58
pero
0.58
Activations Density 0.147%