INDEX
Explanations
phrases related to names of people
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1034
+0.15
0.6%
1141
+0.15
0.6%
204
+0.14
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1141
+0.15
0.03
1034
+0.15
0.02
168
+0.14
0.02
Negative Logits
unspeak
-1.07
gaily
-0.97
increa
-0.96
shenan
-0.96
apprehen
-0.95
indescri
-0.94
assailed
-0.86
vainly
-0.85
disagre
-0.85
unavoid
-0.84
POSITIVE LOGITS
Jim
1.56
Jim
1.55
JIM
1.26
jim
1.24
JIM
1.21
jim
1.17
James
0.75
Jimenez
0.73
James
0.73
Jiménez
0.72
Activations Density 0.055%