INDEX
Explanations
specific numerical references, likely in legal or formal contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
7
+0.13
0.8%
231
+0.13
0.8%
499
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
7
+0.13
0.01
168
+0.13
0.02
495
+0.13
0.03
Negative Logits
(%
-1.54
everywhere
-1.49
%
-1.47
alike
-1.44
hover
-1.42
experiment
-1.40
dominate
-1.40
ipel
-1.39
trillion
-1.38
pill
-1.37
POSITIVE LOGITS
ETHOD
1.79
unnumbered
1.56
Strickland
1.55
isciplinary
1.53
IAL
1.50
COPYRIGHT
1.49
MENTS
1.47
sidered
1.45
VERTISEMENT
1.44
Brad
1.42
Activations Density 0.230%