INDEX
Explanations
variations of the word "op."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
23
+0.24
1.4%
317
+0.14
0.8%
511
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
23
+0.24
0.02
317
+0.14
0.03
511
+0.12
0.02
Negative Logits
iciary
-1.85
facts
-1.63
journal
-1.62
ungen
-1.60
isten
-1.51
naire
-1.51
arios
-1.49
smen
-1.48
ĥ½
-1.46
enstein
-1.45
POSITIVE LOGITS
forward
1.61
.’”
1.59
.’
1.58
oming
1.54
cling
1.53
aping
1.52
slow
1.51
downward
1.50
ative
1.48
rainbow
1.46
Activations Density 0.171%