INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     betweenstory
    -0.82
     ModelExpression
    -0.77
    AndEndTag
    -0.72
     ویکی‌پدیای
    -0.68
    twimg
    -0.68
    Geplaatst
    -0.66
    @[+][
    -0.65
    peche
    -0.63
     <<<<<<<<<<<<<<
    -0.63
    Tikang
    -0.63
    POSITIVE LOGITS
    kopp
    0.42
     trends
    0.39
    edback
    0.38
    hasiswa
    0.37
    üğ
    0.37
    ceuticals
    0.36
    这就是
    0.36
     uchun
    0.35
    üğü
    0.35
    IMC
    0.35
    Act Density 0.000%

    No Known Activations