INDEX
    Explanations

    obviously, presumably, seemingly

    New Auto-Interp
    Negative Logits
    RAJ
    0.53
    *:
    0.48
     شب
    0.44
    شب
    0.43
    НЫ
    0.42
     кү
    0.41
    +:
    0.40
    oL
    0.40
    }}^{*
    0.40
     Государ
    0.40
    POSITIVE LOGITS
    这是一个
    0.60
    Obviously
    0.59
     Obviously
    0.55
     prevalent
    0.55
     obviously
    0.54
     seemingly
    0.52
     supposedly
    0.51
     Presumably
    0.51
    显然
    0.50
     presumably
    0.50
    Act Density 0.321%

    No Known Activations