INDEX
Explanations
sections referring to programming concepts, codes and technical details
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1741
+0.16
0.5%
468
+0.14
0.5%
1177
+0.13
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1288
+0.16
0.05
468
+0.14
0.03
1148
+0.13
0.03
Negative Logits
barbarous
-0.69
sceptre
-0.67
Whigs
-0.64
infernal
-0.64
warlike
-0.64
wretch
-0.64
fulness
-0.63
withal
-0.63
vicissitudes
-0.62
lamella
-0.61
POSITIVE LOGITS
UnusedPrivate
0.62
hatenablog
0.57
<<<<<<<<<<<<<<
0.56
bezeichneter
0.53
مشين
0.52
sembla
0.52
identifiant
0.51
awtextra
0.51
كومونز
0.51
vnd
0.51
Activations Density 0.457%