INDEX
Explanations
phrases concerning legal disputes and ethical considerations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
604
+0.14
0.4%
1919
+0.12
0.4%
872
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1919
+0.14
0.10
1697
+0.12
0.02
1510
+0.10
0.05
Negative Logits
magis
-0.96
lele
-0.92
territo
-0.86
poliester
-0.85
pama
-0.83
naer
-0.83
bandung
-0.80
haer
-0.79
levis
-0.79
°;
-0.77
POSITIVE LOGITS
should
0.98
ought
0.90
shouldn
0.84
must
0.76
should
0.75
could
0.69
Should
0.69
cannot
0.68
Should
0.66
might
0.65
Activations Density 0.445%