INDEX
Explanations
references to programming functions and structures
New Auto-Interp
Negative Logits
twenty
-0.25
22
-0.24
21
-0.23
23
-0.23
24
-0.23
Twenty
-0.23
twenty
-0.21
äºĮåįģ
-0.20
Twenty
-0.20
25
-0.18
POSITIVE LOGITS
0.36
0.36
0.32
0.28
0.27
0.24
0.24
0.23
itten
0.22
----------------------------
0.22
Activations Density 0.010%