INDEX
Explanations
instances of the word "see"
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
197
+0.13
0.7%
478
+0.12
0.6%
116
+0.10
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
324
+0.13
0.03
103
+0.12
0.03
223
+0.10
0.02
Negative Logits
¤
-2.04
¸
-1.80
½
-1.75
eners
-1.65
ime
-1.58
į
-1.58
ute
-1.54
¬
-1.52
Ń
-1.51
den
-1.50
POSITIVE LOGITS
cref
2.22
cknowled
1.78
recent
1.52
andum
1.52
above
1.51
ishes
1.51
embodiment
1.50
below
1.50
Chapter
1.49
mining
1.46
Activations Density 0.080%