INDEX
Explanations
statements about controversial statements or actions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
2019
+0.13
0.4%
964
+0.12
0.4%
604
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2019
+0.13
0.12
1938
+0.12
0.10
768
+0.10
0.10
Negative Logits
gracilis
-0.75
zima
-0.69
serai
-0.67
Fuckin
-0.66
moze
-0.63
kawaida
-0.60
alpes
-0.60
occidentalis
-0.58
vne
-0.57
rano
-0.57
POSITIVE LOGITS
Needless
0.55
Needless
0.52
Edizioni
0.50
paradiso
0.49
dieux
0.48
Thankfully
0.48
Cinéma
0.48
InjectAttribute
0.47
toBeDefined
0.47
natale
0.47
Activations Density 1.610%