INDEX
Explanations
terms related to specific goals or intentions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2025
+0.09
0.3%
596
+0.08
0.2%
2041
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2041
+0.09
0.05
596
+0.08
0.04
1992
+0.08
0.03
Negative Logits
kram
-0.81
Okt
-0.79
naer
-0.72
saad
-0.72
gij
-0.69
kug
-0.68
osal
-0.68
territo
-0.68
uhr
-0.68
priva
-0.67
POSITIVE LOGITS
goal
0.72
Goal
0.58
goal
0.58
aim
0.55
Goal
0.54
objective
0.53
OBJECTIVE
0.52
déclarations
0.52
objectifs
0.52
sièges
0.51
Activations Density 0.172%