INDEX
Explanations
contrasting traditional with new
New Auto-Interp
Negative Logits
nejen
0.66
不仅仅
0.58
bukan
0.57
Bukan
0.55
ไม่ใช่
0.55
并非
0.54
вместо
0.54
instead
0.54
instead
0.48
unusually
0.48
POSITIVE LOGITS
comparatively
0.70
relativamente
0.66
Relatively
0.65
relatively
0.63
relatively
0.62
relativement
0.55
whereas
0.54
比較的
0.54
digamos
0.52
মোটামুটি
0.50
Activations Density 0.095%