INDEX
Explanations
instances of the word "Used."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
484
+0.11
0.6%
166
+0.11
0.6%
316
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
35
+0.11
0.11
371
+0.11
0.12
18
+0.10
0.08
Negative Logits
ķ
-1.55
®
-1.55
hematic
-1.51
drinking
-1.49
bsite
-1.43
mma
-1.43
night
-1.40
crawl
-1.40
toe
-1.38
ERY
-1.38
POSITIVE LOGITS
![**
1.67
wisely
1.61
***
1.44
Clause
1.43
iah
1.41
atti
1.39
backs
1.38
soever
1.37
Himself
1.35
_(
1.34
Activations Density 0.204%