INDEX
Explanations
phrases related to personal stories or narratives
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1133
+0.16
0.8%
1482
+0.14
0.7%
1451
+0.13
0.6%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1363
+0.16
0.07
468
+0.14
0.05
478
+0.13
0.03
Negative Logits
<bos>
-2.02
vainly
-1.08
ⓧ
-0.96
miscon
-0.90
merrily
-0.89
-0.87
effectually
-0.86
/*!
-0.85
triumphantly
-0.85
nobly
-0.84
POSITIVE LOGITS
ly
1.14
tramont
1.00
Luglio
0.94
kasa
0.94
dott
0.93
kac
0.92
pank
0.90
umo
0.89
ristor
0.89
saar
0.88
Activations Density 1.682%