INDEX
Explanations
references to specific locations and events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
111
+0.30
1.7%
491
+0.15
0.9%
71
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
71
+0.30
0.28
4
+0.15
0.26
156
+0.13
0.26
Negative Logits
zo
-1.61
lie
-1.58
?
-1.58
depend
-1.43
bil
-1.42
âĢĤ
-1.40
nickname
-1.36
depended
-1.33
mon
-1.32
pseud
-1.32
POSITIVE LOGITS
Jobs
1.76
úblic
1.74
»¿
1.66
shoots
1.66
reads
1.63
aughs
1.60
adeon
1.52
etica
1.50
releases
1.44
Review
1.39
Activations Density 3.326%