INDEX
Explanations
mentions of parameters in a technical context
New Auto-Interp
Negative Logits
erman
-0.19
atel
-0.18
ress
-0.17
ear
-0.16
eyi
-0.16
arken
-0.16
ermann
-0.16
quiv
-0.15
elier
-0.15
arga
-0.15
POSITIVE LOGITS
ized
0.28
etrize
0.27
ater
0.26
etric
0.26
etr
0.25
ization
0.23
aters
0.23
ised
0.23
ter
0.22
ters
0.21
Activations Density 0.031%