INDEX
Explanations
sentences or phrases emphasizing assessments or opinions
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
674
+0.11
0.3%
872
+0.10
0.3%
964
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
2044
+0.11
0.06
186
+0.10
0.05
1118
+0.08
0.04
Negative Logits
optik
-0.69
antik
-0.67
kompati
-0.66
alkoh
-0.63
minimalis
-0.62
kristal
-0.61
Strukt
-0.59
keramik
-0.59
kompakt
-0.58
kosme
-0.58
POSITIVE LOGITS
but
0.85
BUT
0.68
nhưng
0.68
But
0.65
but
0.64
But
0.64
BUT
0.58
แต่
0.58
pero
0.58
<bos>
0.56
Activations Density 0.535%