INDEX
Explanations
phrases related to wheels and tires
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
90
+0.14
0.6%
61
+0.13
0.5%
1677
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
61
+0.14
0.02
1677
+0.13
0.02
1691
+0.13
0.02
Negative Logits
guarante
-1.11
secon
-0.97
squa
-0.94
chrysler
-0.93
volunte
-0.92
accla
-0.92
overla
-0.91
intermitt
-0.90
maneu
-0.89
coö
-0.89
POSITIVE LOGITS
wheel
1.54
wheel
1.46
Wheel
1.38
wheels
1.35
Wheel
1.32
Wheels
1.13
WHEEL
1.11
wheels
1.09
WHEEL
1.06
Wheels
0.97
Activations Density 0.090%