INDEX
Explanations
verbs related to progress or accomplishment
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
690
+0.14
0.4%
405
+0.10
0.3%
1392
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
690
+0.14
0.04
1413
+0.10
0.04
1493
+0.09
0.04
Negative Logits
<bos>
-0.74
OA
-0.47
dépasse
-0.43
ולה
-0.42
OutOfBounds
-0.41
ferait
-0.40
pourra
-0.40
ROWS
-0.39
restera
-0.39
Your
-0.38
POSITIVE LOGITS
AppBundle
0.71
corsair
0.68
maneu
0.68
hanggang
0.65
outlander
0.64
reft
0.63
hilux
0.62
ļ
0.61
Біографія
0.61
milf
0.61
Activations Density 0.696%