INDEX
Explanations
sentences containing narratives about people or events
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1577
+0.26
0.9%
394
+0.22
0.8%
674
+0.17
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
184
+0.26
0.02
1531
+0.22
0.08
16
+0.17
0.17
Negative Logits
reluct
-1.52
apprehen
-1.44
gaily
-1.43
disagre
-1.43
shenan
-1.41
withal
-1.39
intersper
-1.37
unspeak
-1.37
encomp
-1.34
strick
-1.29
POSITIVE LOGITS
<bos>
1.09
Citiți
0.75
↘
0.68
↗
0.65
Alguna
0.63
Și
0.62
Identyfik
0.61
lenker
0.60
Pentru
0.60
Autoritní
0.60
Activations Density 5.859%