INDEX
Explanations
comparative phrases indicating higher likelihood or probability
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
198
+0.08
0.2%
1993
+0.08
0.2%
674
+0.08
0.2%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1428
+0.08
0.03
1475
+0.08
0.02
116
+0.08
0.03
Negative Logits
mittances
-0.72
rumore
-0.68
Chinois
-0.67
Romains
-0.66
caratteri
-0.66
incess
-0.63
fumo
-0.63
Violon
-0.63
Cinéma
-0.63
cartier
-0.62
POSITIVE LOGITS
<bos>
0.87
than
0.73
Than
0.63
než
0.59
Than
0.58
THAN
0.57
than
0.56
Likely
0.50
chance
0.49
likely
0.48
Activations Density 0.225%