INDEX
Explanations
descriptions of events or actions involving multiple people
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1535
+0.25
0.8%
2034
+0.18
0.6%
382
+0.18
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.25
0.10
1535
+0.18
0.08
310
+0.18
0.05
Negative Logits
abnorm
-1.51
Lmfao
-1.49
nece
-1.48
suspic
-1.45
uncin
-1.44
Ikr
-1.41
thut
-1.41
emphat
-1.40
antem
-1.40
Lma
-1.39
POSITIVE LOGITS
↵↵
1.03
<eos>
1.03
But
0.92
However
0.92
↵↵↵
0.92
Although
0.90
This
0.89
There
0.88
Thus
0.87
Then
0.87
Activations Density 0.486%