INDEX
Explanations
instances of the word "seem."
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
376
+0.18
1.0%
144
+0.14
0.8%
148
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
214
+0.18
0.03
144
+0.14
0.03
109
+0.12
0.03
Negative Logits
brush
-1.59
herself
-1.58
inse
-1.56
---|---|---
-1.55
rely
-1.46
story
-1.44
suit
-1.43
answer
-1.41
ise
-1.38
extent
-1.38
POSITIVE LOGITS
manship
1.88
hof
1.60
encial
1.59
hoc
1.56
sterdam
1.55
vu
1.54
ugin
1.51
ingly
1.50
unable
1.47
dal
1.46
Activations Density 0.334%