INDEX
Explanations
repetitions of the word "you."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
410
+0.16
0.9%
241
+0.14
0.8%
59
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
410
+0.16
0.03
59
+0.14
0.02
477
+0.12
0.03
Negative Logits
onymous
-2.10
ogr
-1.99
asleep
-1.66
otle
-1.63
igraph
-1.58
ocal
-1.55
emic
-1.52
onym
-1.51
uric
-1.50
ocur
-1.50
POSITIVE LOGITS
´
2.22
ī
2.11
Ŀ
2.08
illard
2.07
ettes
2.06
ľ
2.01
¯
2.01
ffer
1.99
¿
1.89
µ
1.89
Activations Density 0.174%