INDEX
Explanations
various forms of the word "from" in relation to comparisons and origins
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
219
+0.12
0.7%
485
+0.11
0.6%
155
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
300
+0.12
0.09
409
+0.11
0.11
176
+0.11
0.10
Negative Logits
interim
-1.60
ium
-1.54
Lett
-1.47
*(
-1.42
ered
-1.41
cember
-1.36
Ins
-1.34
involved
-1.33
staff
-1.32
esc
-1.32
POSITIVE LOGITS
ĥ½
3.50
ŀ
3.48
ĨĴ
3.38
IJ
3.37
č↵
3.32
↵
3.32
↵
3.32
↵
3.32
↵
3.32
3.32
Activations Density 1.557%