INDEX
Explanations
mentions of hair-related terms like "hair salon," "salon," and descriptions of hair
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
204
+0.13
0.5%
896
+0.12
0.4%
1026
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
204
+0.13
0.02
896
+0.12
0.02
1026
+0.11
0.02
Negative Logits
bascul
-0.51
ché
-0.51
Dawg
-0.51
Pockets
-0.50
cushi
-0.48
mnop
-0.48
Stretcher
-0.47
Assorted
-0.47
Ferdin
-0.47
Wrench
-0.46
POSITIVE LOGITS
hair
1.41
Hair
1.32
hair
1.29
Hair
1.29
HAIR
1.23
HAIR
1.08
hairs
1.00
haired
0.99
haired
0.92
hairs
0.87
Activations Density 0.078%