INDEX
Explanations
words related to legal or bureaucratic processes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.23
0.8%
1741
+0.14
0.5%
1445
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1406
+0.23
0.05
1950
+0.14
0.05
1445
+0.13
0.06
Negative Logits
bourgeo
-1.31
sappi
-1.31
applau
-1.26
incess
-1.26
igno
-1.17
nutr
-1.13
emphat
-1.12
ordina
-1.12
;;)
-1.11
simplif
-1.10
POSITIVE LOGITS
there
0.78
although
0.73
it
0.70
while
0.69
we
0.66
they
0.64
the
0.61
if
0.61
despite
0.60
since
0.60
Activations Density 0.245%