INDEX
Explanations
people describing their background and experiences
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
227
+0.10
0.3%
1741
+0.09
0.3%
845
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
658
+0.10
0.06
1692
+0.09
0.03
1727
+0.09
0.03
Negative Logits
jandro
-0.55
malheureux
-0.50
stasia
-0.50
)>=
-0.48
heran
-0.47
berken
-0.47
eckel
-0.46
makam
-0.46
ikyuu
-0.46
侵略
-0.46
POSITIVE LOGITS
myself
0.74
Sebagai
0.64
<bos>
0.62
GEBURTSDATUM
0.61
Setiap
0.60
Bukan
0.59
álbum
0.58
expandindo
0.56
фициальный
0.56
Sklici
0.56
Activations Density 0.550%