INDEX
Explanations
instances of negation, specifically the word "not."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1262
+0.14
0.5%
950
+0.11
0.4%
1974
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1262
+0.14
0.07
1974
+0.11
0.06
878
+0.11
0.06
Negative Logits
arpa
-0.47
Hymen
-0.46
Krieges
-0.45
oplasma
-0.45
Mga
-0.44
MethodInfo
-0.44
savent
-0.44
ثيق
-0.43
arakhand
-0.43
“……”
-0.43
POSITIVE LOGITS
philanth
0.90
hairc
0.82
vectra
0.81
vhs
0.80
shenan
0.79
necessari
0.79
ktm
0.79
ikkert
0.78
toshiba
0.77
necessarie
0.75
Activations Density 0.222%