INDEX
Explanations
technical issues or problems in a context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
468
+0.12
0.4%
1265
+0.12
0.4%
990
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
990
+0.12
0.07
468
+0.12
0.07
36
+0.11
0.06
Negative Logits
vogli
-0.66
voglio
-0.65
Serviço
-0.64
vedo
-0.61
Personendaten
-0.61
interessa
-0.60
vogliamo
-0.59
penso
-0.59
vorrei
-0.59
Ilustra
-0.58
POSITIVE LOGITS
tolerably
0.94
nobly
0.86
gaily
0.86
beaute
0.84
withal
0.82
unwarran
0.82
unce
0.80
pite
0.79
schoolmaster
0.78
surmounted
0.76
Activations Density 0.380%