INDEX
Explanations
words related to purity or cleanliness
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1325
+0.11
0.4%
680
+0.11
0.4%
938
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
938
+0.11
0.02
680
+0.11
0.01
1325
+0.10
0.01
Negative Logits
metálico
-0.51
bakteri
-0.48
débil
-0.48
ôtel
-0.48
Newspapers
-0.48
redondo
-0.47
extraña
-0.45
contemporánea
-0.45
metálica
-0.44
vacía
-0.43
POSITIVE LOGITS
PURE
1.22
pure
1.18
Pure
1.14
pure
1.14
Pure
1.11
PURE
1.05
uncin
0.94
purity
0.93
pura
0.85
cytoplas
0.84
Activations Density 0.064%