INDEX
Explanations
phrases expressing likelihood or probability
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1325
+0.11
0.4%
1392
+0.11
0.4%
812
+0.11
0.4%
Correlated Neurons
Index
P. Corr.
Cos Sim.
156
+0.11
0.03
1392
+0.11
0.03
168
+0.11
0.03
Negative Logits
aut
-0.43
Revenir
-0.42
Wol
-0.42
Sem
-0.41
Bois
-0.41
Schwar
-0.41
Gelen
-0.41
Sem
-0.41
nová
-0.40
commun
-0.39
POSITIVE LOGITS
Likely
0.89
Likely
0.88
likely
0.86
likely
0.84
NTIS
0.63
unlikely
0.63
Likelihood
0.62
tramont
0.61
boulangerie
0.59
likelihood
0.59
Activations Density 0.090%