INDEX
Explanations
the use of adverbs and phrases that characterize frequency or manner in a context
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
153
+0.14
0.8%
248
+0.13
0.7%
419
+0.12
0.7%
Correlated Neurons
Index
P. Corr.
Cos Sim.
317
+0.14
0.09
248
+0.13
0.09
219
+0.12
0.08
Negative Logits
ôle
-1.44
sit
-1.40
acker
-1.36
elle
-1.35
uto
-1.35
{-1.33
meet
-1.31
ari
-1.30
noreply
-1.28
ette
-1.27
POSITIVE LOGITS
↵
2.79
<|padding|>
2.79
↵
2.79
<|outofrange|>
2.79
↵
2.79
↵
2.79
č↵
2.79
2.79
↵
2.79
2.79
Activations Density 1.405%