INDEX
    Explanations

    phrasing and emphasis choice

    New Auto-Interp
    Negative Logits
    0.43
     നിരവധി
    0.41
    爱好者
    0.41
    每天
    0.40
    ১৯
    0.39
    无论是
    0.39
    !」
    0.39
    涉及
    0.39
    𝕦
    0.39
    正常
    0.39
    POSITIVE LOGITS
     wording
    0.65
     phrasing
    0.63
     subtle
    0.58
     eup
    0.57
     vague
    0.52
     worded
    0.51
    emphasis
    0.50
     ambiguous
    0.49
     emphasized
    0.49
     emphasis
    0.49
    Act Density 0.144%

    No Known Activations