INDEX
Explanations
mentions of specific names, potentially related to journalism or reporting
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1044
+0.18
1.0%
131
+0.13
0.7%
1334
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1044
+0.18
0.05
1097
+0.13
0.05
227
+0.13
0.05
Negative Logits
<bos>
-0.90
papà
-0.59
chante
-0.59
jectures
-0.58
portait
-0.58
plaatst
-0.56
voirs
-0.55
curé
-0.55
contributo
-0.54
sindaco
-0.54
POSITIVE LOGITS
Roger
1.48
Roger
1.40
ROGER
1.29
roger
1.26
roger
1.05
Rogers
0.94
Rogers
0.89
Rog
0.81
Rog
0.76
ROGERS
0.71
Activations Density 0.505%