INDEX
Explanations
instances of the word "stand."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
380
+0.14
0.8%
172
+0.13
0.7%
111
+0.13
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
380
+0.14
0.01
172
+0.13
0.01
381
+0.13
0.01
Negative Logits
vain
-1.75
CLAIM
-1.74
·¸
-1.66
overtime
-1.66
MERCHANTABILITY
-1.66
expired
-1.65
disguise
-1.60
LLOW
-1.59
distress
-1.55
distressed
-1.55
POSITIVE LOGITS
ards
2.23
finder
2.00
edly
1.91
ography
1.84
iative
1.83
iÄĩ
1.83
bench
1.77
ingly
1.76
ers
1.76
iom
1.75
Activations Density 0.007%