INDEX
    Explanations

    health related

    New Auto-Interp
    Negative Logits
     GenerationType
    -0.94
     Мексичка
    -0.90
    findpost
    -0.89
     disambiguazione
    -0.88
     surla
    -0.79
     kasarigan
    -0.77
    tanleria
    -0.77
    NUMX
    -0.75
    httphttps
    -0.75
     ويكيميديا
    -0.75
    POSITIVE LOGITS
    ted
    0.54
    ting
    0.50
    the
    0.48
     order
    0.48
    try
    0.47
     need
    0.46
    ss
    0.45
     time
    0.45
    czyn
    0.44
    ann
    0.44
    Act Density 0.073%

    No Known Activations