INDEX
Explanations
occurrences of the word "FROM."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.16
0.9%
129
+0.13
0.7%
412
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
417
+0.16
0.01
129
+0.13
0.01
205
+0.10
0.01
Negative Logits
hydroxyl
-1.49
traced
-1.46
atin
-1.42
brow
-1.41
'?
-1.38
'?"
-1.36
âĢī
-1.35
'$
-1.32
res
-1.31
hem
-1.31
POSITIVE LOGITS
heed
1.92
anks
1.65
brakes
1.65
thouse
1.59
draft
1.55
aine
1.54
ESULT
1.52
aylor
1.52
Rptr
1.51
bushes
1.51
Activations Density 0.138%