INDEX
Explanations
instances of quotes or dialogue
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
478
+0.16
0.9%
6
+0.15
0.8%
261
+0.14
0.8%
Correlated Neurons
Index
P. Corr.
Cos Sim.
261
+0.16
0.04
66
+0.15
0.04
466
+0.14
0.04
Negative Logits
existed
-1.57
etition
-1.56
matters
-1.52
ests
-1.51
exists
-1.50
ository
-1.47
mattered
-1.45
fecture
-1.42
icial
-1.41
exist
-1.40
POSITIVE LOGITS
ĭ
2.64
®
2.63
ĻĤ
2.40
¿½
2.38
ľĵ
2.27
°
2.24
·
2.19
Ľ
2.15
³
2.13
¦
2.10
Activations Density 0.220%