INDEX
Explanations
instances of the word "one."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
161
+0.16
0.9%
259
+0.13
0.7%
320
+0.12
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
122
+0.16
0.01
161
+0.13
0.01
355
+0.12
0.02
Negative Logits
reviews
-1.78
cents
-1.66
="$(
-1.55
»¿
-1.49
ftware
-1.49
journal
-1.43
aeda
-1.41
software
-1.40
publisher
-1.37
retailers
-1.33
POSITIVE LOGITS
suspended
1.70
himself
1.62
grown
1.62
Guards
1.60
herself
1.59
screened
1.59
wash
1.53
divided
1.50
aligned
1.50
disturbed
1.50
Activations Density 0.098%