INDEX
Explanations
terms related to research, education, and specific conditions or phenomena
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
111
+0.15
0.9%
376
+0.14
0.8%
189
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
111
+0.15
0.01
46
+0.14
0.01
458
+0.13
0.01
Negative Logits
ĥ
-1.71
Ģ
-1.57
Ļª
-1.53
ability
-1.51
occasional
-1.49
ctomy
-1.48
coli
-1.48
willingness
-1.46
eping
-1.43
ters
-1.37
POSITIVE LOGITS
inx
1.79
umerable
1.75
though
1.54
wait
1.53
ière
1.53
iously
1.48
yet
1.46
olecular
1.44
undo
1.44
it
1.43
Activations Density 0.017%