INDEX
Explanations
words related to probabilities and odds, focusing on comparisons between different likelihoods or chances
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1044
+0.10
0.3%
663
+0.10
0.3%
1654
+0.09
0.3%
Correlated Neurons
Index
P. Corr.
Cos Sim.
1408
+0.10
0.04
1992
+0.10
0.03
1283
+0.09
0.03
Negative Logits
FlatList
-0.46
rí
-0.46
geograf
-0.45
interag
-0.45
silang
-0.44
filma
-0.44
galer
-0.44
suscit
-0.43
Oltre
-0.43
iyon
-0.43
POSITIVE LOGITS
probability
0.95
odds
0.95
chances
0.94
chanced
0.91
probability
0.88
Chances
0.87
probabilities
0.85
chance
0.85
Probability
0.84
likelihood
0.83
Activations Density 0.107%