INDEX
Explanations
references to notable historical figures and events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
369
+0.17
1.0%
31
+0.17
1.0%
286
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
286
+0.17
0.11
365
+0.17
0.07
31
+0.14
0.14
Negative Logits
dominant
-1.33
enance
-1.29
eds
-1.28
mente
-1.26
'"
-1.26
linger
-1.24
elp
-1.24
(%
-1.24
ool
-1.23
lication
-1.23
POSITIVE LOGITS
²
1.65
marks
1.63
âĢħ
1.43
ĻĤ
1.40
dered
1.35
itos
1.33
[â̦]
1.32
oths
1.31
thousands
1.24
hundreds
1.23
Activations Density 4.437%