INDEX
Explanations
technical language and details related to scientific or engineering processes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
388
+0.13
0.8%
216
+0.13
0.7%
249
+0.11
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
168
+0.13
0.10
388
+0.13
0.04
363
+0.11
0.07
Negative Logits
documentclass
-1.74
$]{}-1.68
rounds
-1.50
chers
-1.49
$$
-1.49
]}
-1.47
$
-1.46
awning
-1.46
pping
-1.40
indoors
-1.40
POSITIVE LOGITS
↵
2.37
2.37
↵
2.37
↵↵
2.37
2.37
↵↵
2.37
↵
2.37
č↵
2.37
↵
2.37
↵
2.37
Activations Density 3.075%