INDEX
Explanations
phrases that express contrast or opposition
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
990
+0.09
0.3%
382
+0.09
0.2%
605
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
382
+0.09
0.05
310
+0.09
0.03
605
+0.08
0.01
Negative Logits
sappi
-1.15
peculi
-1.14
incess
-1.12
alkoh
-1.10
meis
-1.06
franz
-1.06
pecuni
-1.03
nomine
-1.00
inder
-1.00
doman
-0.99
POSITIVE LOGITS
fact
1.09
indeed
1.07
Indeed
1.02
actually
1.01
indeed
0.96
Actually
0.94
Indeed
0.91
actually
0.89
むしろ
0.83
Actually
0.79
Activations Density 0.267%