INDEX
    Explanations

    frequent discourse markers or transitional phrases

    New Auto-Interp
    Negative Logits
    hint
    -0.16
    emouth
    -0.15
    anca
    -0.15
    à¹īาà¸ĩ
    -0.14
    utter
    -0.14
    icon
    -0.14
    riz
    -0.14
     darn
    -0.14
    hook
    -0.14
    oga
    -0.14
    POSITIVE LOGITS
     onto
    0.37
    onto
    0.33
     enough
    0.30
     Enough
    0.27
     Ont
    0.26
    Ont
    0.24
     back
    0.22
     moving
    0.21
    moving
    0.21
    Enough
    0.21
    Act Density 0.092%

    No Known Activations