INDEX
Explanations
the presence of technical terms or jargon related to websites, search results, and user experience
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.16
0.5%
1843
+0.10
0.3%
511
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1415
+0.16
0.05
1919
+0.10
0.06
1843
+0.09
0.04
Negative Logits
Coim
-0.92
McLaugh
-0.91
Vaugh
-0.91
Augu
-0.85
Penna
-0.80
McInt
-0.80
Thos
-0.79
Juf
-0.77
Rine
-0.76
Bartholo
-0.76
POSITIVE LOGITS
<bos>
1.25
reading
0.77
read
0.75
reading
0.71
READING
0.68
read
0.67
digest
0.65
reads
0.57
reader
0.57
READ
0.56
Activations Density 0.538%