INDEX
Explanations
the occurrence of the word "two."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
504
+0.12
0.6%
367
+0.10
0.5%
197
+0.10
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
139
+0.12
0.10
264
+0.10
0.07
366
+0.10
0.07
Negative Logits
NOTICE
-1.64
ŀ
-1.51
reasons
-1.50
ARI
-1.38
BASIS
-1.38
audible
-1.36
advertisements
-1.36
ocyanate
-1.34
ños
-1.33
aria
-1.32
POSITIVE LOGITS
ston
1.63
gets
1.60
osity
1.55
Gs
1.55
genstein
1.54
Std
1.51
libs
1.46
geometric
1.45
weg
1.45
friends
1.44
Activations Density 0.065%