INDEX
Explanations
dash-separated phrases or quotes
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1350
+0.16
0.6%
381
+0.14
0.5%
2019
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
795
+0.16
0.03
122
+0.14
0.04
214
+0.13
0.03
Negative Logits
Hvem
-0.87
Flere
-0.81
Hvorfor
-0.76
Hvordan
-0.75
ējās
-0.75
Hvad
-0.72
Ikke
-0.71
maintien
-0.71
Hvor
-0.69
Pozdrawiam
-0.69
POSITIVE LOGITS
magis
1.05
kafe
1.02
antik
1.01
teras
1.00
culturali
0.99
kram
0.96
adal
0.96
logis
0.96
hant
0.96
utop
0.96
Activations Density 0.092%