INDEX
    Explanations

    terms related to health, specifically those concerning medical conditions or terms signifying caution and assessment of health risks

    New Auto-Interp
    Negative Logits
     же
    -0.20
    oi
    -0.19
    ове
    -0.18
    uv
    -0.17
    iw
    -0.17
    uw
    -0.17
    евиÑĩ
    -0.16
    ил
    -0.16
    479
    -0.16
    u
    -0.16
    POSITIVE LOGITS
    ÑĶ
    0.38
    ÑİÑĤÑĮ
    0.37
    ÑĶÑĤÑĮÑģÑı
    0.33
    ÑĶÑĤе
    0.31
    ÑİÑĩи
    0.28
    ÑİÑĤÑĮÑģÑı
    0.28
    ÑĶÑĪ
    0.25
    ÑĶмо
    0.25
    Ñİ
    0.21
    ÐĦ
    0.21
    Act Density 0.011%

    No Known Activations