INDEX
Explanations
phrases starting with "that" followed by a verb in past tense or other information
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
513
+0.13
0.4%
2019
+0.10
0.3%
1023
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
513
+0.13
0.05
1023
+0.10
0.06
1350
+0.10
0.05
Negative Logits
lele
-0.99
siena
-0.99
ananas
-0.96
koz
-0.96
makro
-0.95
Meksi
-0.92
avto
-0.90
eki
-0.90
lampa
-0.90
adal
-0.89
POSITIVE LOGITS
shouldn
0.68
resembles
0.67
resonates
0.62
might
0.62
involves
0.61
requires
0.61
could
0.61
deserves
0.61
hasn
0.61
ressemble
0.59
Activations Density 0.210%