INDEX
    Explanations

    phrases related to health risks or medical conditions

    New Auto-Interp
    Negative Logits
    unn
    -0.14
    _INITIALIZ
    -0.14
    agua
    -0.14
    ÏĦιν
    -0.13
    imity
    -0.13
    unami
    -0.13
    406
    -0.12
    ungan
    -0.12
    епÑĤи
    -0.12
    íĽĪ
    -0.12
    POSITIVE LOGITS
     into
    0.45
    into
    0.40
     early
    0.40
     beyond
    0.40
     onward
    0.39
     onwards
    0.39
     Into
    0.35
    Into
    0.35
    early
    0.34
    _into
    0.33
    Act Density 0.063%

    No Known Activations