INDEX
Explanations
structured legal phrases or terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
73
+0.15
0.8%
153
+0.14
0.8%
414
+0.13
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
494
+0.15
0.44
209
+0.14
0.28
402
+0.13
0.26
Negative Logits
]):
-1.69
ycin
-1.69
elijk
-1.63
gio
-1.59
](
-1.55
)):
-1.53
\]]{}-1.43
))=
-1.40
glich
-1.38
](
-1.38
POSITIVE LOGITS
¾
1.95
¹
1.85
³
1.84
ly
1.66
uscript
1.65
://
1.63
ī
1.63
bury
1.52
oting
1.47
nered
1.46
Activations Density 1.660%