INDEX
Explanations
titles of films, books, and articles
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
297
+0.13
0.4%
776
+0.12
0.4%
906
+0.12
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
776
+0.13
0.06
147
+0.12
0.03
575
+0.12
0.04
Negative Logits
xxiii
-0.56
xxv
-0.56
Bedankt
-0.56
Veel
-0.56
xxvi
-0.55
Jakie
-0.55
regardant
-0.54
Prí
-0.53
Pře
-0.53
whofe
-0.53
POSITIVE LOGITS
minimalis
0.82
palab
0.77
utop
0.73
gmbh
0.72
demen
0.71
kuns
0.70
abnorm
0.70
lapto
0.70
verba
0.69
pietre
0.69
Activations Density 0.318%