INDEX
Explanations
words related to details or specific features in a context of a narrative or instructions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
569
+0.11
0.3%
1044
+0.09
0.3%
1871
+0.08
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1265
+0.11
0.04
330
+0.09
0.03
783
+0.08
0.03
Negative Logits
brune
-0.98
alkoh
-0.97
depic
-0.97
McLaugh
-0.94
rigide
-0.92
fré
-0.91
apprehen
-0.90
silikon
-0.90
jette
-0.88
intersper
-0.88
POSITIVE LOGITS
provides
0.97
allows
0.95
creates
0.94
does
0.93
gets
0.93
makes
0.93
gives
0.92
doesn
0.90
generates
0.88
goes
0.87
Activations Density 0.541%