INDEX
Explanations
specific years, particularly those related to significant historical events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.16
0.9%
82
+0.13
0.7%
330
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
82
+0.16
0.01
110
+0.13
0.01
475
+0.12
0.01
Negative Logits
suspicion
-1.71
ird
-1.50
behold
-1.45
ĻĤ
-1.40
dissent
-1.39
gmail
-1.38
realm
-1.37
privileges
-1.37
yon
-1.36
arts
-1.34
POSITIVE LOGITS
icolor
1.50
etched
1.49
endants
1.48
anniversary
1.46
okia
1.41
icol
1.41
ienn
1.38
uary
1.36
^[@
1.35
erals
1.34
Activations Density 0.017%