INDEX
Explanations
contrasting traditional with new
New Auto-Interp
Negative Logits
lisi
0.37
Param
0.35
behest
0.35
hatta
0.35
auch
0.35
paramet
0.35
multip
0.34
Regardless
0.34
bądź
0.33
bele
0.33
POSITIVE LOGITS
Previously
1.08
previously
1.03
Previously
0.98
Whereas
0.97
previously
0.96
以前
0.95
従来の
0.95
従来
0.93
Unlike
0.92
traditionally
0.92
Activations Density 0.275%