INDEX
Explanations
names of popular culture references or famous personalities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.38
1.3%
304
+0.19
0.7%
381
+0.15
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2019
+0.38
0.04
924
+0.19
0.04
927
+0.15
0.03
Negative Logits
requently
-0.67
amaged
-0.62
tupperware
-0.61
tilizer
-0.58
requent
-0.55
WENT
-0.55
rapnel
-0.54
余额
-0.54
underval
-0.54
ocused
-0.54
POSITIVE LOGITS
mikrofon
1.07
silikon
1.00
optik
0.94
keramik
0.93
kafe
0.91
komik
0.89
marte
0.88
kompres
0.88
confé
0.87
karton
0.84
Activations Density 0.107%