INDEX
Explanations
names of individuals, likely public figures or experts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1097
+0.16
0.5%
1978
+0.14
0.4%
1741
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1097
+0.16
0.05
981
+0.14
0.04
1654
+0.11
0.04
Negative Logits
affez
-1.26
allarg
-1.26
cammin
-1.26
parati
-1.25
soggior
-1.24
rilass
-1.11
dirit
-1.07
cioc
-1.07
tramont
-1.07
lele
-1.07
POSITIVE LOGITS
’
0.77
himself
0.76
'
0.74
has
0.59
is
0.58
was
0.56
׳
0.56
had
0.55
went
0.55
herself
0.54
Activations Density 0.125%