INDEX
    Explanations

    phrases relating to expectations and outcomes

    New Auto-Interp
    Negative Logits
    Ñıгом
    -0.15
     ÐļÑĢÑĸм
    -0.12
    одаÑĢ
    -0.12
    ï¼Į以åıĬ
    -0.12
    жÑĥ
    -0.12
    rire
    -0.11
    uming
    -0.11
    Ķëĭ¤
    -0.11
    ãģŁãĤĬ
    -0.11
    loha
    -0.11
    POSITIVE LOGITS
     but
    1.15
    but
    0.93
     nhưng
    0.84
     BUT
    0.77
    ä½Ĩ
    0.74
     но
    0.74
    _but
    0.73
     pero
    0.71
     But
    0.71
    ï¼Įä½Ĩ
    0.71
    Act Density 4.197%

    No Known Activations