INDEX
Explanations
titles, names, and specific attributes related to individuals and their roles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
23
+0.16
0.9%
10
+0.14
0.8%
266
+0.13
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
23
+0.16
0.04
10
+0.14
0.02
150
+0.13
0.03
Negative Logits
analogy
-1.73
itate
-1.64
âĢIJ
-1.62
rization
-1.58
ificial
-1.50
tbl
-1.49
thouse
-1.49
ismo
-1.49
endas
-1.48
idades
-1.45
POSITIVE LOGITS
ĻĤ
3.12
Į
2.75
ĨĴ
2.54
¦
2.51
²
2.50
ľĵ
2.49
Īĺ
2.49
ĺ
2.39
©
2.38
Ĺ
2.36
Activations Density 0.191%