INDEX
Explanations
phrases related to physical actions or interactions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1356
+0.09
0.3%
1038
+0.08
0.2%
1103
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
392
+0.09
0.03
1356
+0.08
0.03
1608
+0.08
0.02
Negative Logits
quitted
-0.98
gaily
-0.96
apprehen
-0.95
maneu
-0.95
disagre
-0.94
berea
-0.93
depic
-0.92
encomp
-0.90
thut
-0.90
shewn
-0.89
POSITIVE LOGITS
batore
0.56
Milán
0.53
gelo
0.52
sodio
0.51
poliuret
0.51
manuten
0.51
Enf
0.50
spaceBetween
0.50
Holanda
0.48
Suecia
0.48
Activations Density 0.455%