INDEX
Explanations
words related to legal or policy matters
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.13
0.4%
1343
+0.09
0.3%
1473
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1473
+0.13
0.03
490
+0.09
0.03
1799
+0.09
0.02
Negative Logits
However
-0.51
muer
-0.45
trà
-0.45
Poli
-0.44
balo
-0.44
riba
-0.44
kuper
-0.44
However
-0.44
however
-0.43
Mat
-0.42
POSITIVE LOGITS
mondeo
0.71
stickied
0.66
geforce
0.64
0.64
/**
0.62
avenger
0.62
souffrance
0.60
fortnite
0.60
jetta
0.58
zelda
0.58
Activations Density 0.284%