INDEX
Explanations
instances of the word "of."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
23
+0.17
0.9%
12
+0.11
0.6%
412
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
30
+0.17
0.03
18
+0.11
0.03
12
+0.11
0.02
Negative Logits
ģ
-2.34
©
-2.22
ī
-2.22
¸
-2.17
¼
-2.00
İ
-1.98
Ĵ
-1.98
¡
-1.98
ĭ
-1.97
¢
-1.97
POSITIVE LOGITS
opes
1.73
clusions
1.62
strange
1.54
knots
1.54
years
1.53
weeks
1.48
Parenthood
1.47
cgi
1.46
silly
1.46
ittal
1.45
Activations Density 0.196%