INDEX
Explanations
sometimes introducing examples
New Auto-Interp
Negative Logits
Unemployment
0.44
sollten
0.42
zuc
0.42
Should
0.41
Ventilation
0.40
ataire
0.39
uerre
0.39
Somewhat
0.39
bør
0.39
viel
0.38
POSITIVE LOGITS
oftentimes
0.48
たとえば
0.39
場合
0.38
例如
0.38
ง่าย
0.38
например
0.38
比如
0.37
ঘাট
0.37
┳
0.37
sometimes
0.36
Activations Density 0.001%