INDEX
Explanations
safety and health-related instructions or guidelines
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
792
+0.10
0.3%
752
+0.10
0.3%
198
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
198
+0.10
0.05
792
+0.10
0.04
678
+0.09
0.04
Negative Logits
journaux
-0.73
icônes
-0.69
frasi
-0.67
prochaines
-0.66
ouvrages
-0.65
devoirs
-0.64
négociations
-0.64
numéros
-0.63
spectateurs
-0.61
musées
-0.61
POSITIVE LOGITS
Keny
0.75
Tapa
0.74
inspiring
0.66
racon
0.65
intrigu
0.64
Compañ
0.64
upvoted
0.64
Fait
0.64
inspirational
0.63
talented
0.62
Activations Density 0.513%