INDEX
Explanations
names of specific individuals
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
481
+0.14
0.6%
1096
+0.14
0.6%
370
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
981
+0.14
0.09
481
+0.14
0.06
227
+0.13
0.07
Negative Logits
Flere
-0.84
Kruse
-0.70
elegante
-0.65
Hvordan
-0.65
Hvem
-0.65
Hermans
-0.62
Schreiber
-0.58
Hvorfor
-0.57
Hvor
-0.57
Schröder
-0.56
POSITIVE LOGITS
stopp
0.90
paff
0.89
udd
0.81
bandung
0.81
noss
0.78
tass
0.78
obb
0.78
Krzysz
0.78
fupp
0.77
milano
0.77
Activations Density 0.411%