INDEX
Explanations
instances relating to making plans or organizing events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2034
+0.25
0.7%
1535
+0.12
0.4%
1699
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.25
0.07
827
+0.12
0.04
1775
+0.09
0.05
Negative Logits
emphat
-1.55
disagre
-1.49
hentai
-1.49
milf
-1.49
🤣🤣
-1.47
viciss
-1.46
unwarran
-1.43
inconce
-1.42
unlaw
-1.41
suspic
-1.41
POSITIVE LOGITS
<eos>
0.86
WindowConstants
0.73
But
0.72
Hopefully
0.71
↵↵
0.66
Hopefully
0.65
}.
0.65
but
0.65
But
0.64
OnInit
0.64
Activations Density 0.452%