INDEX
    Explanations

    phrases related to threats or dangers

    New Auto-Interp
    Negative Logits
    rint
    -0.15
    ehler
    -0.15
    amin
    -0.15
     Worm
    -0.15
    iao
    -0.15
    arent
    -0.15
    agues
    -0.14
    .gstatic
    -0.14
    olest
    -0.14
    ledon
    -0.14
    POSITIVE LOGITS
     dangers
    0.19
     danger
    0.18
    baar
    0.16
    çĬ¶
    0.15
    lug
    0.15
    elm
    0.15
    132
    0.15
    ources
    0.15
    ably
    0.14
     Danger
    0.14
    Act Density 0.029%

    No Known Activations