INDEX
Explanations
terms relating to optimization
New Auto-Interp
Negative Logits
ible
-0.19
leton
-0.19
eled
-0.19
erate
-0.16
eos
-0.16
ibles
-0.16
icular
-0.16
ey
-0.15
/*č↵
-0.15
eing
-0.15
POSITIVE LOGITS
ally
0.33
ised
0.30
izes
0.29
ized
0.28
izing
0.26
izers
0.26
isation
0.24
istic
0.24
istically
0.23
ization
0.23
Activations Density 0.008%