INDEX
Explanations
references to user profiles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
43
+0.13
0.8%
269
+0.12
0.7%
178
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
178
+0.13
0.02
43
+0.12
0.01
269
+0.12
0.01
Negative Logits
ĥ½
-4.11
Ĥ
-3.83
¤
-3.79
ķ
-3.67
§
-3.64
ÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤÃĥÃĤ
-3.54
¿½
-3.53
Ĥ¬
-3.52
ĺ
-3.52
³
-3.52
POSITIVE LOGITS
(“
1.81
area
1.69
apparatus
1.67
styles
1.66
aggio
1.65
holder
1.61
album
1.61
(@"
1.61
vier
1.60
books
1.59
Activations Density 0.013%