INDEX
Explanations
mentions of specific entities, possibly related to criminal activities or locations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
404
+0.10
0.3%
172
+0.09
0.3%
90
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1120
+0.10
0.04
1343
+0.09
0.04
1194
+0.09
0.03
Negative Logits
Mère
-1.32
Meilleur
-1.04
Aéroport
-0.98
paillettes
-0.98
Comédie
-0.95
Sén
-0.91
Messieurs
-0.88
Septembre
-0.87
Famille
-0.85
chèvre
-0.85
POSITIVE LOGITS
Cou
2.23
Bou
1.92
cou
1.88
Cou
1.86
Bou
1.73
cou
1.72
Sou
1.70
Kou
1.70
Kou
1.64
Mou
1.62
Activations Density 0.243%