INDEX
Explanations
comparative phrases, strong opinions, and legal terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.12
0.3%
1253
+0.09
0.2%
678
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
678
+0.12
0.05
1060
+0.09
0.04
1116
+0.09
0.03
Negative Logits
apprehen
-1.13
Daven
-1.13
unspeak
-1.11
McLaugh
-1.08
Vaugh
-1.08
Middles
-1.05
intersper
-1.04
withal
-1.01
impra
-1.01
intrigu
-1.00
POSITIVE LOGITS
lapto
0.68
studier
0.65
ideolog
0.64
nosi
0.63
váy
0.59
typelib
0.59
género
0.58
ஒ
0.57
personali
0.57
yogur
0.56
Activations Density 0.340%