INDEX
Explanations
the conjunction "but" and its usage in contrastive contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.30
1.3%
381
+0.12
0.5%
1276
+0.11
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1276
+0.30
0.13
1068
+0.12
0.10
555
+0.11
0.10
Negative Logits
<bos>
-2.97
expandindo
-0.79
endwhile
-0.75
ⓧ
-0.69
/***
-0.67
ContentAlignment
-0.67
protected
-0.63
Enllaços
-0.61
}]);
-0.61
//{
-0.61
POSITIVE LOGITS
impra
1.62
reluct
1.54
shenan
1.53
disagre
1.49
hairc
1.48
unspeak
1.48
impractica
1.46
maneu
1.45
scrat
1.45
swarovski
1.42
Activations Density 0.358%