INDEX
Explanations
phrases related to poetry or song lyrics
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1150
+0.17
0.6%
1510
+0.11
0.4%
906
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
282
+0.17
0.02
509
+0.11
0.03
1380
+0.11
0.00
Negative Logits
Fácil
-0.81
Imágenes
-0.72
Hermoso
-0.71
Fø
-0.69
Legături
-0.69
Πηγή
-0.68
Flere
-0.67
Acab
-0.64
Muito
-0.64
Hvad
-0.62
POSITIVE LOGITS
embodi
1.76
oleo
1.45
?...
1.40
fluo
1.39
!...
1.38
unden
1.33
squa
1.32
friable
1.28
nutella
1.27
pollut
1.27
Activations Density 0.218%