INDEX
Explanations
numeric values related to statistics or scores
New Auto-Interp
Negative Logits
seventeen
-0.33
nineteen
-0.33
sixteen
-0.31
twenty
-0.31
eighteen
-0.30
äºĮåįģ
-0.29
fifteen
-0.29
XIV
-0.29
двад
-0.29
Twenty
-0.29
POSITIVE LOGITS
12
0.55
11
0.54
10
0.51
9
0.50
8
0.49
7
0.48
6
0.47
5
0.47
4
0.45
3
0.44
Activations Density 0.065%