INDEX
Explanations
the infinitive form of verbs
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
197
+0.12
0.7%
51
+0.12
0.6%
343
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
69
+0.12
0.04
109
+0.12
0.03
434
+0.12
0.03
Negative Logits
IJ
-2.58
ı
-2.04
ī
-1.85
ķ
-1.84
ħ
-1.74
¿½
-1.69
Ī
-1.58
halves
-1.58
↵
-1.56
↵
-1.56
POSITIVE LOGITS
itude
1.75
hew
1.74
schedule
1.53
iculous
1.51
javase
1.50
7554
1.43
hello
1.43
ibilities
1.40
blog
1.39
docs
1.38
Activations Density 0.027%