INDEX
Explanations
phrases indicating a request for feedback or information sharing.
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
763
+0.08
0.2%
1150
+0.08
0.2%
1343
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
763
+0.08
0.03
1056
+0.08
0.03
1799
+0.08
0.01
Negative Logits
kram
-0.89
kupa
-0.89
makro
-0.88
alkoh
-0.88
lele
-0.88
akut
-0.86
karton
-0.84
silikon
-0.84
fua
-0.83
antik
-0.82
POSITIVE LOGITS
how
0.67
whether
0.65
if
0.60
what
0.57
tell
0.55
any
0.55
your
0.54
whether
0.52
via
0.52
einander
0.51
Activations Density 0.136%