INDEX
Explanations
phrases related to legal documents
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.46
2.2%
1875
+0.08
0.4%
1535
+0.08
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
15
+0.46
0.21
1403
+0.08
0.19
354
+0.08
0.21
Negative Logits
reluct
-8.89
increa
-8.84
inev
-8.41
impra
-8.26
disagre
-8.25
affor
-8.20
depic
-8.13
volunte
-8.13
encomp
-8.05
shenan
-8.05
POSITIVE LOGITS
<bos>
13.34
Paglinawan
2.47
betweenstory
2.36
Walkover
2.27
RegressionTest
2.25
Autoritní
2.24
Italijani
2.23
はじめに
2.22
↵
2.21
Panamoan
2.18
Activations Density 0.071%