INDEX
Explanations
file paths and directory structures related to software or commands
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
66
+0.13
0.7%
118
+0.13
0.7%
139
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
66
+0.13
0.02
139
+0.13
0.02
18
+0.11
0.02
Negative Logits
ľĵ
-1.73
Ĥ
-1.65
same
-1.63
£
-1.58
YPT
-1.56
neutrality
-1.55
ĵ
-1.53
↵
-1.47
↵
-1.47
-1.47
POSITIVE LOGITS
ocular
2.06
Laden
1.97
estock
1.95
hood
1.80
hardt
1.77
ness
1.71
shaw
1.67
heit
1.66
endor
1.61
itude
1.61
Activations Density 0.196%