INDEX
Explanations
technical terms related to parameters in various contexts
New Auto-Interp
Negative Logits
erman
-0.20
ress
-0.18
arga
-0.17
resses
-0.17
atel
-0.17
eyi
-0.17
resse
-0.16
ear
-0.16
eltas
-0.16
elier
-0.16
POSITIVE LOGITS
ized
0.28
etrize
0.26
ater
0.24
etric
0.24
ization
0.23
ised
0.23
etr
0.23
ter
0.22
aters
0.21
izable
0.21
Activations Density 0.035%