INDEX
Explanations
mean or rate related to outcome
New Auto-Interp
Negative Logits
álním
0.46
('.')[0.41
ichtig
0.40
пропор
0.40
Bracket
0.39
হোক
0.39
括号
0.38
Between
0.38
стала
0.38
علاق
0.38
POSITIVE LOGITS
very
0.49
almost
0.45
barely
0.45
apenas
0.42
slechts
0.41
низ
0.41
low
0.40
amort
0.39
lower
0.39
much
0.38
Activations Density 0.006%