INDEX
Explanations
phrases related to society, responsibility, and duty
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
872
+0.14
0.4%
1445
+0.13
0.4%
1892
+0.11
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1892
+0.14
0.06
1265
+0.13
0.05
390
+0.11
0.06
Negative Logits
ftu
-1.36
haer
-1.29
lele
-1.28
ftate
-1.27
ftre
-1.27
paff
-1.23
ufe
-1.23
magis
-1.22
fta
-1.22
vns
-1.19
POSITIVE LOGITS
therefore
0.92
hence
0.81
this
0.79
thats
0.79
consequently
0.76
thus
0.75
it
0.75
if
0.74
yet
0.74
that
0.73
Activations Density 0.246%