INDEX
Explanations
punctuation marks, particularly closing parentheses
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
23
+0.30
1.8%
258
+0.21
1.2%
250
+0.20
1.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
250
+0.30
0.20
412
+0.21
0.14
258
+0.20
0.14
Negative Logits
↵
-3.98
↵ ↵
-3.98
-3.98
-3.98
↵
-3.98
↵↵↵
-3.98
-3.98
-3.98
-3.98
↵↵
-3.98
POSITIVE LOGITS
marriages
1.77
wed
1.75
marrying
1.71
married
1.68
divorced
1.61
psy
1.59
between
1.49
marriage
1.48
intimate
1.45
”.
1.44
Activations Density 5.441%