INDEX
Explanations
references to social media and its impact
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
494
+0.13
0.7%
400
+0.12
0.6%
328
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
262
+0.13
0.04
494
+0.12
0.03
400
+0.12
0.02
Negative Logits
ĥ½
-5.76
²
-5.58
¾
-5.49
↵
-5.46
<|outofrange|>
-5.46
↵
-5.46
↵↵
-5.46
-5.46
<|outofrange|>
-5.46
↵
-5.46
POSITIVE LOGITS
OTE
1.93
rior
1.86
notices
1.64
emon
1.62
acker
1.60
link
1.52
uncher
1.52
ookie
1.51
alerts
1.50
ijer
1.49
Activations Density 0.513%