INDEX
Explanations
contractions with apostrophes (') followed by a word
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.11
0.3%
1741
+0.11
0.3%
50
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
19
+0.11
0.10
873
+0.11
0.08
1902
+0.09
0.09
Negative Logits
Amigos
-0.51
桌面
-0.50
tortas
-0.48
pelo
-0.48
ung
-0.48
therapeutic
-0.47
puerta
-0.47
soft
-0.47
students
-0.46
事务
-0.46
POSITIVE LOGITS
nece
1.31
purcha
1.29
effe
1.28
desir
1.18
waer
1.15
inev
1.14
lende
1.13
bett
1.12
thut
1.12
fto
1.11
Activations Density 0.459%