INDEX
Explanations
instances of the word "even" being emphasized
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1482
+0.15
0.5%
605
+0.14
0.5%
544
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1482
+0.15
0.05
544
+0.14
0.04
201
+0.11
0.03
Negative Logits
igno
-0.90
exem
-0.86
antem
-0.84
orch
-0.84
robus
-0.81
ria
-0.80
illi
-0.77
agi
-0.77
erec
-0.77
immen
-0.76
POSITIVE LOGITS
even
0.98
even
0.91
EVEN
0.80
навіть
0.73
Even
0.72
though
0.71
despite
0.71
Even
0.71
EVEN
0.67
zelfs
0.66
Activations Density 0.076%