INDEX
Explanations
words related to legal and social issues, as well as mentions of specific individuals and locations
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1978
+0.12
0.4%
411
+0.12
0.4%
289
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
289
+0.12
0.05
101
+0.12
0.06
1262
+0.11
0.06
Negative Logits
Mémoires
-0.99
soigne
-0.97
Violon
-0.92
lele
-0.92
répon
-0.91
rafra
-0.88
Moderato
-0.88
parlamento
-0.87
Lég
-0.87
Châ
-0.87
POSITIVE LOGITS
want
0.74
know
0.69
necessarily
0.68
need
0.68
dont
0.66
have
0.66
t
0.64
niestety
0.63
deserve
0.61
t
0.60
Activations Density 0.228%