INDEX
Explanations
phrases related to dust and dirt
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
966
+0.14
0.5%
411
+0.14
0.5%
1837
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1837
+0.14
0.03
966
+0.14
0.02
201
+0.13
0.01
Negative Logits
Oppen
-0.51
Rela
-0.51
Tril
-0.51
Vira
-0.49
Lilian
-0.49
Rami
-0.49
Ibid
-0.49
Beal
-0.48
Wre
-0.48
Rana
-0.48
POSITIVE LOGITS
dust
1.36
dust
1.28
DUST
1.25
Dust
1.20
Dust
1.17
dusting
1.00
dusted
0.92
ardust
0.85
Dusty
0.82
dusty
0.78
Activations Density 0.085%