INDEX
Explanations
phrases related to expressing opinions and thoughts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
605
+0.11
0.3%
1276
+0.10
0.3%
1506
+0.10
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1276
+0.11
0.07
555
+0.10
0.05
1677
+0.10
0.06
Negative Logits
fta
-1.90
ftu
-1.74
aen
-1.64
Juf
-1.63
mef
-1.63
dises
-1.62
emphat
-1.61
fup
-1.59
effe
-1.58
vns
-1.56
POSITIVE LOGITS
it
0.90
there
0.88
they
0.83
if
0.79
we
0.78
you
0.75
don
0.73
I
0.73
its
0.72
unfortunately
0.71
Activations Density 0.217%