INDEX
Explanations
phrases related to leadership positions or roles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1793
+0.14
0.5%
376
+0.13
0.5%
122
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1793
+0.14
0.04
376
+0.13
0.04
650
+0.12
0.03
Negative Logits
becker
-0.50
smtplib
-0.49
autogui
-0.49
Sympathi
-0.48
fringed
-0.48
pettico
-0.47
colourful
-0.46
hever
-0.45
cami
-0.44
tutu
-0.44
POSITIVE LOGITS
head
1.23
Head
1.22
head
1.21
Head
1.18
HEAD
1.12
HEAD
1.05
heads
1.01
heads
0.97
Heads
0.91
Heads
0.87
Activations Density 0.048%