INDEX
Explanations
phrases related to data access and systems vulnerability
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
752
+0.20
0.7%
1967
+0.17
0.5%
2016
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
752
+0.20
0.09
2016
+0.17
0.09
1967
+0.10
0.06
Negative Logits
!...
-1.65
?...
-1.64
ftu
-1.60
fta
-1.55
effe
-1.51
:,,
-1.50
thut
-1.50
emphat
-1.49
purcha
-1.49
ftre
-1.48
POSITIVE LOGITS
both
0.85
the
0.77
each
0.76
those
0.75
our
0.73
their
0.73
every
0.71
whatever
0.70
what
0.68
these
0.67
Activations Density 0.726%