INDEX
Explanations
obituaries containing detailed information about individuals' lives
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.30
1.1%
453
+0.09
0.4%
1533
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1533
+0.30
0.04
1169
+0.09
0.05
509
+0.08
0.04
Negative Logits
<bos>
-1.29
align
-0.68
<?
-0.67
attract
-0.67
prevent
-0.66
<?
-0.66
seek
-0.65
keep
-0.64
ⓧ
-0.63
continue
-0.63
POSITIVE LOGITS
rafra
1.59
véhic
1.57
ftu
1.44
vété
1.42
»>
1.41
?...
1.41
délib
1.39
accla
1.37
écout
1.36
soulign
1.35
Activations Density 0.560%