INDEX
Explanations
phrases related to social issues and arguments
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.11
0.3%
1870
+0.10
0.3%
2019
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1806
+0.11
0.06
1265
+0.10
0.06
1224
+0.10
0.06
Negative Logits
vœux
-0.85
négociations
-0.84
ecru
-0.83
leçons
-0.82
hairc
-0.81
bénéfices
-0.79
indestru
-0.78
Bartholo
-0.77
McInt
-0.77
preuves
-0.77
POSITIVE LOGITS
minuta
0.77
psicologia
0.71
fras
0.69
curiosa
0.66
furg
0.65
interessa
0.63
crede
0.63
kawa
0.62
resources
0.62
alpes
0.61
Activations Density 0.357%