INDEX
Explanations
references to adults in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
156
+0.24
1.4%
362
+0.14
0.8%
407
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
407
+0.24
0.02
445
+0.14
0.01
362
+0.13
0.02
Negative Logits
иÑģ
-1.64
ched
-1.49
oci
-1.48
subseteq
-1.47
аÑģÑģ
-1.45
sible
-1.43
ĨĴ
-1.40
for
-1.38
defer
-1.38
еÑģ
-1.38
POSITIVE LOGITS
lives
1.78
equivalents
1.76
remains
1.73
icides
1.70
living
1.63
dispute
1.58
ennial
1.54
basis
1.54
urious
1.54
quarters
1.51
Activations Density 0.059%