INDEX
Explanations
references to research studies and reports
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
284
+0.09
0.3%
1055
+0.09
0.3%
227
+0.09
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1823
+0.09
0.04
1055
+0.09
0.04
1213
+0.09
0.05
Negative Logits
utop
-1.05
palet
-0.95
ideolog
-0.93
santiago
-0.90
logis
-0.90
loto
-0.88
sement
-0.86
psychiat
-0.85
gend
-0.84
palab
-0.84
POSITIVE LOGITS
jątk
0.74
Dziękuję
0.70
bénéficiaire
0.67
durante
0.64
Materiał
0.63
jusqu
0.63
Dijo
0.61
Dzięki
0.61
ypeł
0.61
pueden
0.61
Activations Density 0.195%