INDEX
Explanations
descriptions of physical interactions between characters
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
906
+0.13
0.4%
604
+0.13
0.4%
674
+0.12
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
343
+0.13
0.04
736
+0.13
0.04
1533
+0.12
0.01
Negative Logits
apprehen
-1.25
gaily
-1.21
accla
-1.07
mischie
-1.06
intrigu
-1.04
maneu
-1.03
disagre
-1.01
pooh
-1.00
shenan
-0.98
fuf
-0.96
POSITIVE LOGITS
Palabras
0.54
words
0.52
cât
0.52
aproape
0.51
îna
0.50
words
0.50
invokeLater
0.49
hoeddwyd
0.48
vă
0.48
зулта
0.48
Activations Density 0.249%