INDEX
Explanations
instances of the word "gets."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.17
1.0%
193
+0.12
0.7%
53
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
378
+0.17
0.01
146
+0.12
0.01
66
+0.12
0.01
Negative Logits
myself
-1.82
bey
-1.70
ppler
-1.58
moonlight
-1.45
silence
-1.43
gg
-1.42
inen
-1.38
certain
-1.34
asma
-1.34
small
-1.32
POSITIVE LOGITS
mith
2.27
istical
1.98
ystem
1.79
erving
1.77
eful
1.75
ince
1.74
minded
1.72
abad
1.71
hearted
1.71
cript
1.71
Activations Density 0.018%