INDEX
Explanations
the letter 's' in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.24
0.8%
131
+0.07
0.2%
251
+0.07
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1454
+0.24
0.06
473
+0.07
0.07
42
+0.07
0.07
Negative Logits
<bos>
-1.59
させていただきます
-0.68
ω
-0.68
α
-0.64
أيضًا
-0.64
защото
-0.64
〈
-0.64
—
-0.64
日
-0.63
govine
-0.63
POSITIVE LOGITS
reluct
1.93
accla
1.85
disagre
1.80
indestru
1.79
maneu
1.74
shenan
1.73
impra
1.70
apprehen
1.70
increa
1.68
unspeak
1.68
Activations Density 1.468%