INDEX
Explanations
instances of the word "first" and its variations, indicating a focus on initial occurrences or beginnings
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
22
+0.12
0.7%
44
+0.12
0.7%
56
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
340
+0.12
0.06
22
+0.12
0.05
170
+0.11
0.07
Negative Logits
utter
-1.65
à¯ģ
-1.63
áĢº
-1.54
osi
-1.48
lett
-1.48
à¯į
-1.48
à¯
-1.47
aid
-1.45
ÑĥÑĤ
-1.45
ollo
-1.44
POSITIVE LOGITS
ĻĤ
2.81
¿½
2.70
Ŀ
2.63
ħ
2.42
»¿
2.33
ľĵ
2.31
£
2.21
ĭ
2.21
ª
2.21
ĥ
2.16
Activations Density 0.921%