INDEX
Explanations
the word "the" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
98
+0.17
1.0%
125
+0.12
0.7%
400
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
291
+0.17
0.05
402
+0.12
0.04
480
+0.11
0.04
Negative Logits
Ī
-2.36
Ĩ
-2.14
ī
-2.13
¤
-2.03
°
-2.01
»
-1.99
«
-1.98
²
-1.95
®
-1.92
ª
-1.85
POSITIVE LOGITS
ways
1.80
orems
1.77
aforementioned
1.68
way
1.68
above
1.66
resources
1.63
rapeut
1.62
ogene
1.61
ses
1.55
steps
1.54
Activations Density 0.237%