INDEX
Explanations
phrases related to death and punishment
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1296
+0.15
0.6%
397
+0.13
0.5%
421
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1296
+0.15
0.04
421
+0.13
0.04
397
+0.13
0.03
Negative Logits
Datuak
-0.62
渥
-0.58
translateX
-0.49
Bourgoin
-0.48
aspi
-0.48
gere
-0.45
bewerken
-0.45
Opin
-0.44
Enllaces
-0.44
FactoryBean
-0.44
POSITIVE LOGITS
death
1.16
death
1.14
DEATH
1.11
Death
1.11
Death
1.10
DEATH
0.99
deaths
0.93
Deaths
0.90
Deaths
0.87
Muerte
0.85
Activations Density 0.085%