INDEX
Explanations
references to care services
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.16
0.9%
451
+0.14
0.8%
173
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
451
+0.16
0.02
385
+0.14
0.02
277
+0.11
0.02
Negative Logits
³
-2.43
Ļª
-2.43
ĨĴ
-2.38
¯
-2.34
£
-2.18
ı
-2.09
¾
-2.06
ĵ
-2.02
§
-1.98
ĥ½
-1.95
POSITIVE LOGITS
thouse
1.89
gens
1.59
leaks
1.53
azzo
1.50
fully
1.49
Instr
1.44
uable
1.42
gate
1.39
jax
1.39
leaked
1.39
Activations Density 0.007%