INDEX
Explanations
phrases indicating legal definitions or statuses
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.19
1.1%
156
+0.16
0.9%
144
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
24
+0.19
0.01
463
+0.16
0.01
384
+0.10
0.01
Negative Logits
±
-2.91
ĥ½
-2.87
½
-2.83
ģ
-2.64
¸
-2.62
ĭ
-2.52
į
-2.50
®
-2.47
ĩ
-2.44
IJ
-2.41
POSITIVE LOGITS
Justice
1.47
Privacy
1.40
ulsion
1.38
ibus
1.36
orect
1.36
fraction
1.33
wrong
1.30
olecular
1.30
interference
1.29
pardon
1.28
Activations Density 0.117%