INDEX
    Explanations

    phrases indicating protection or safety from various dangers or negative influences

    New Auto-Interp
    Negative Logits
     sparing
    -0.14
    acific
    -0.14
     <<<
    -0.14
    èĥ¶
    -0.14
    ahl
    -0.14
    ĶĶ
    -0.14
    /fwlink
    -0.13
    luent
    -0.13
     ÙħØ´Ú©
    -0.13
    oyer
    -0.13
    POSITIVE LOGITS
     scrutiny
    0.25
     attack
    0.24
     being
    0.20
     criticism
    0.19
     becoming
    0.19
     harm
    0.18
     harms
    0.18
     attacks
    0.18
    æĿ¥èĩª
    0.18
     further
    0.17
    Act Density 0.160%

    No Known Activations