INDEX
Explanations
phrases related to surviving extreme conditions or remarkable stories of survival
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.20
0.6%
752
+0.14
0.4%
872
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
198
+0.20
0.08
752
+0.14
0.06
16
+0.11
0.07
Negative Logits
decided
-0.58
-
-0.56
said
-0.56
re
-0.55
<eos>
-0.54
together
-0.54
↵↵
-0.53
↵
-0.53
must
-0.53
–
-0.53
POSITIVE LOGITS
Traité
1.34
Février
1.30
Sén
1.30
Souha
1.27
Mémoires
1.25
Docteur
1.24
carrefour
1.23
curé
1.23
Áng
1.22
haer
1.21
Activations Density 0.715%