INDEX
Explanations
phrases related to conflict and controversy
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.10
0.3%
382
+0.10
0.3%
882
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.10
0.05
1265
+0.10
0.04
588
+0.09
0.05
Negative Logits
alkoh
-1.02
kram
-0.97
praktik
-0.97
meis
-0.96
makro
-0.92
kosme
-0.91
franz
-0.91
antik
-0.90
pira
-0.90
solidar
-0.90
POSITIVE LOGITS
sondern
1.02
nor
1.01
but
0.81
but
0.74
بلکه
0.74
而是
0.72
nor
0.69
<bos>
0.68
nhưng
0.65
sino
0.60
Activations Density 0.226%