INDEX
Explanations
legal language related to regulations and waivers
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
0.7%
776
+0.10
0.4%
468
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1473
+0.18
0.04
191
+0.10
0.03
261
+0.10
0.03
Negative Logits
<bos>
-2.22
intersper
-1.38
encomp
-1.17
quitted
-1.11
depic
-1.02
reluct
-1.00
superintend
-0.99
disbur
-0.97
endow
-0.96
adjour
-0.94
POSITIVE LOGITS
procedura
0.77
Toilette
0.75
GRATU
0.74
BnF
0.72
ferie
0.71
sorella
0.71
polizia
0.70
sfera
0.68
kupa
0.68
ferrovia
0.68
Activations Density 0.159%