INDEX
    Explanations

    phrases indicating health-related issues and concerns

    New Auto-Interp
    Negative Logits
    INET
    -0.16
    indow
    -0.16
    fo
    -0.15
    astos
    -0.15
    ردÙĩ
    -0.15
     PLUS
    -0.15
     Plus
    -0.15
    anela
    -0.15
    BorderStyle
    -0.15
    ometr
    -0.14
    POSITIVE LOGITS
     reasons
    0.22
    åİŁåĽł
    0.22
     partly
    0.19
     reason
    0.17
     because
    0.17
     Reasons
    0.17
     partially
    0.16
    ิà¹Ĥ
    0.16
    ÙĮ
    0.15
    822
    0.15
    Act Density 0.161%

    No Known Activations