INDEX
Explanations
legal terminologies and case citations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
369
+0.16
0.9%
332
+0.15
0.9%
82
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
332
+0.16
0.02
82
+0.15
0.02
168
+0.12
0.02
Negative Logits
()
-1.63
ema
-1.55
dataset
-1.53
orems
-1.48
orect
-1.44
ony
-1.44
RESULT
-1.41
performs
-1.40
validate
-1.38
ew
-1.37
POSITIVE LOGITS
ĻĤ
4.74
¦
4.32
¿½
4.26
ŀ
4.18
§
4.02
3.96
↵↵
3.96
3.96
↵
3.96
↵ ↵
3.96
Activations Density 0.050%