INDEX
Explanations
references to scores or points in various contexts
New Auto-Interp
Neuron Alignment
Index
Value
% of L₁
1296
+0.13
0.5%
521
+0.13
0.5%
699
+0.13
0.5%
Correlated Neurons
Index
P. Corr.
Cos Sim.
699
+0.13
0.04
214
+0.13
0.03
521
+0.13
0.03
Negative Logits
Mű
-0.49
Público
-0.47
volezza
-0.47
ejo
-0.46
אֲ
-0.46
stornos
-0.46
buie
-0.46
tivazione
-0.44
HttpPut
-0.44
Observa
-0.44
POSITIVE LOGITS
score
1.38
scores
1.31
Scores
1.26
score
1.25
Score
1.24
Scoring
1.21
SCORE
1.21
scores
1.17
scored
1.13
SCORE
1.13
Activations Density 0.080%