INDEX
Explanations
instances of the term "interface" in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.17
1.0%
14
+0.14
0.8%
507
+0.11
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
14
+0.17
0.03
234
+0.14
0.02
501
+0.11
0.02
Negative Logits
Ĥ
-1.92
Ļª
-1.88
occasional
-1.62
unnumbered
-1.59
nasty
-1.52
´
-1.52
Yankees
-1.49
ĥ
-1.45
sudden
-1.41
Ģ
-1.40
POSITIVE LOGITS
arium
2.30
amente
1.92
iga
1.87
cias
1.82
ily
1.80
nal
1.73
emis
1.70
aho
1.64
hel
1.64
eful
1.64
Activations Density 0.020%