INDEX
Explanations
terms related to global issues or entities
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1870
+0.15
0.5%
1777
+0.12
0.4%
131
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1777
+0.15
0.03
131
+0.12
0.03
1909
+0.11
0.03
Negative Logits
romain
-0.57
InjectAttribute
-0.54
Cinéma
-0.50
unve
-0.49
POETRY
-0.48
vété
-0.48
disreg
-0.48
aggress
-0.48
carus
-0.48
pettico
-0.48
POSITIVE LOGITS
global
1.02
Global
0.96
global
0.96
Global
0.94
globales
0.92
GLOBAL
0.82
globale
0.81
lobal
0.80
GLOBAL
0.79
globe
0.67
Activations Density 0.081%