INDEX
Explanations
phrases related to the value of human lives
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1482
+0.16
0.5%
597
+0.13
0.4%
874
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1482
+0.16
0.03
1296
+0.13
0.03
874
+0.13
0.03
Negative Logits
elek
-0.91
uhr
-0.91
kask
-0.88
karton
-0.86
silikon
-0.86
kram
-0.84
naer
-0.84
makro
-0.83
quoc
-0.81
moza
-0.81
POSITIVE LOGITS
lives
1.15
lives
1.04
Lives
0.99
LIVES
0.96
Lives
0.95
life
0.87
life
0.85
lived
0.80
Life
0.78
LIFE
0.78
Activations Density 0.062%