INDEX
Explanations
political references or news articles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
599
+0.24
0.9%
50
+0.16
0.6%
1842
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
599
+0.24
0.11
1
+0.16
0.06
613
+0.12
0.07
Negative Logits
<bos>
-0.97
pinulongan
-0.67
Normdatei
-0.58
WriteBarrier
-0.56
NUKAT
-0.55
انيف
-0.53
OGND
-0.52
الإنجليزية
-0.50
ffilmiau
-0.50
<eos>
-0.49
POSITIVE LOGITS
emphat
1.33
accla
1.07
practition
1.04
Khart
0.98
maneu
0.97
reluct
0.97
volunte
0.94
fta
0.94
embra
0.94
disagre
0.93
Activations Density 1.499%