INDEX
Explanations
opinions or thoughts on various topics or ideas
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
136
+0.10
0.3%
1252
+0.09
0.2%
1026
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
332
+0.10
0.04
1372
+0.09
0.04
284
+0.08
0.04
Negative Logits
lele
-1.16
uhr
-1.10
kasa
-1.08
kark
-1.08
pank
-1.07
saar
-1.07
!...
-1.06
territo
-1.05
kac
-1.04
haup
-1.03
POSITIVE LOGITS
thoughts
0.90
regarding
0.86
opinions
0.85
opinion
0.84
views
0.75
concerning
0.72
Thoughts
0.69
about
0.67
thoughts
0.67
on
0.65
Activations Density 0.239%