INDEX
Explanations
specific movie titles and references to popular media
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.20
0.8%
690
+0.14
0.6%
2034
+0.14
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
304
+0.20
0.02
138
+0.14
0.07
113
+0.14
0.06
Negative Logits
<bos>
-3.37
ⓧ
-0.94
הת
-0.74
הח
-0.74
SequentialGroup
-0.71
הע
-0.70
Identyfik
-0.70
nawr
-0.69
<?
-0.69
#![
-0.69
POSITIVE LOGITS
maneu
2.28
impra
2.07
increa
1.98
accla
1.97
disagre
1.95
emphat
1.93
shenan
1.92
depic
1.86
reluct
1.86
inev
1.86
Activations Density 0.270%