INDEX
Explanations
instances of the word "but" and its context in contrasting statements
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
50
+0.33
1.4%
1068
+0.11
0.5%
528
+0.10
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1276
+0.33
0.11
1068
+0.11
0.10
555
+0.10
0.09
Negative Logits
<bos>
-2.82
/***
-0.88
ⓧ
-0.80
<?
-0.66
expandindo
-0.66
//*/
-0.63
/**
-0.62
/*++
-0.61
diagon
-0.60
};*/
-0.56
POSITIVE LOGITS
hornblende
0.78
particolar
0.78
soulign
0.77
nhung
0.74
Minang
0.73
signora
0.72
But
0.70
ngunit
0.70
véhic
0.69
Putih
0.69
Activations Density 0.285%