INDEX
Explanations
annotated diagrams in scientific contexts or presentations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.14
0.4%
1343
+0.12
0.4%
1385
+0.12
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1856
+0.14
0.04
906
+0.12
0.00
1082
+0.12
0.02
Negative Logits
republi
-0.75
klap
-0.74
nabi
-0.72
deko
-0.71
katal
-0.71
radikal
-0.70
Rektor
-0.70
kaos
-0.69
optik
-0.69
stik
-0.69
POSITIVE LOGITS
impractica
0.68
arrows
0.62
McLaugh
0.60
MATHEMATICAL
0.59
pamph
0.59
Cringe
0.59
Throwaway
0.56
unve
0.55
unwarran
0.54
Ehh
0.54
Activations Density 0.173%