INDEX
Explanations
names of people, especially fixed phrases like 'Donald Trump' or 'Ivanka Trump'
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1043
+0.09
0.3%
1097
+0.09
0.3%
498
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.09
0.06
1654
+0.09
0.05
1493
+0.08
0.04
Negative Logits
bandung
-1.25
jaya
-1.20
uhr
-1.20
venezuela
-1.17
kasa
-1.15
disagre
-1.15
allarg
-1.14
lele
-1.13
fatis
-1.13
ftu
-1.12
POSITIVE LOGITS
ftagPool
0.60
Gemeinsame
0.55
implicitly
0.54
Mac
0.52
Перейти
0.51
’
0.51
<eos>
0.50
Mc
0.49
LeBron
0.49
Go
0.49
Activations Density 0.303%