INDEX
    Explanations

    phrases indicating exceptions or qualifications in general statements

    New Auto-Interp
    Negative Logits
    رشف
    -0.69
    -0.65
    Халык
    -0.64
     lenker
    -0.62
    AsUp
    -0.62
     '\\;'
    -0.62
    twimg
    -0.61
     ligiloj
    -0.60
     BoxFit
    -0.59
    сылкі
    -0.58
    POSITIVE LOGITS
     necessarily
    0.55
    Alike
    0.52
     lắm
    0.49
    znacz
    0.48
    ?}",
    0.47
     šť
    0.47
     perfetta
    0.45
     isolato
    0.45
    andidaten
    0.44
     końca
    0.44
    Act Density 0.299%

    No Known Activations