INDEX
Explanations
contractions of verbs with negative indicators
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
381
+0.12
0.4%
1557
+0.10
0.3%
1077
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1557
+0.12
0.03
1604
+0.10
0.02
689
+0.10
0.02
Negative Logits
makro
-1.12
akut
-1.10
solidar
-1.04
ideolog
-1.01
kac
-1.00
alkoh
-0.97
antik
-0.97
buk
-0.97
teras
-0.96
optik
-0.96
POSITIVE LOGITS
unspeak
1.16
tolerably
1.09
gaily
1.08
unwarran
1.04
apprehen
1.03
intersper
1.02
indestru
1.01
disagre
1.01
shenan
0.99
Shakspeare
0.98
Activations Density 0.175%