INDEX
Explanations
phrases related to rules, violations, and legal terms
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.18
0.9%
1942
+0.09
0.4%
699
+0.09
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
188
+0.18
0.03
944
+0.09
0.02
193
+0.09
0.03
Negative Logits
<bos>
-3.11
/***
-0.82
initComponents
-0.74
///**
-0.70
ReactDOM
-0.67
//});
-0.67
//};
-0.64
intios
-0.62
addCriterion
-0.61
EndContext
-0.61
POSITIVE LOGITS
Minang
1.20
Juf
1.19
affor
1.19
wien
1.15
salomon
1.12
toledo
1.09
Viol
1.09
beverly
1.09
casio
1.08
fta
1.07
Activations Density 0.127%