INDEX
Explanations
mentions of job titles and roles within various organizations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.17
0.5%
198
+0.12
0.3%
1842
+0.12
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
752
+0.17
0.04
16
+0.12
0.06
198
+0.12
0.04
Negative Logits
reft
-0.77
tranf
-0.75
ACKNOWLEDGMENTS
-0.70
nutella
-0.67
:—
-0.65
hastly
-0.64
lefs
-0.63
unce
-0.62
thut
-0.61
foon
-0.61
POSITIVE LOGITS
clients
0.79
Áng
0.76
Barcelone
0.65
Berlín
0.65
Jardín
0.63
Vaticano
0.60
clients
0.60
Bahía
0.60
Jérusalem
0.59
ModelExpression
0.59
Activations Density 0.412%