INDEX
Explanations
mentions related to accusations or legal issues
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
313
+0.14
0.5%
198
+0.12
0.4%
1810
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
313
+0.14
0.05
1810
+0.12
0.04
1691
+0.10
0.04
Negative Logits
fédé
-0.67
alliés
-0.58
délib
-0.55
semblables
-0.53
présidenti
-0.51
traités
-0.50
inégal
-0.50
Etimo
-0.49
destinées
-0.48
récents
-0.48
POSITIVE LOGITS
who
0.70
those
0.70
those
0.65
Those
0.63
whom
0.63
who
0.60
THOSE
0.58
mène
0.58
tanong
0.58
Those
0.57
Activations Density 0.088%