INDEX
Explanations
terms related to memory and recollection
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.22
1.2%
148
+0.20
1.2%
115
+0.18
1.1%
Correlated Neurons
Index
P. Corr.
Cos Sim.
462
+0.22
0.02
148
+0.20
0.01
280
+0.18
0.01
Negative Logits
«
-2.29
nered
-1.97
½
-1.92
pired
-1.85
į
-1.70
»¿
-1.67
Ń
-1.65
ainen
-1.65
µ
-1.64
-1.62
POSITIVE LOGITS
nothing
1.54
privacy
1.46
please
1.44
stolen
1.44
asting
1.42
roviral
1.39
identally
1.38
gaps
1.37
giving
1.36
ently
1.36
Activations Density 0.024%