INDEX
    Explanations

    words and phrases related to safety and security

    New Auto-Interp
    Negative Logits
    725
    -0.16
    oze
    -0.15
    cene
    -0.15
    azer
    -0.15
    gne
    -0.15
    oge
    -0.14
    aldo
    -0.14
    ãģĿãĤĮ
    -0.14
    ink
    -0.14
     charges
    -0.14
    POSITIVE LOGITS
     Unsafe
    0.19
    çī
    0.17
    ubern
    0.16
    unsafe
    0.16
     safer
    0.16
    Unsafe
    0.16
    safe
    0.16
    ÑģÑĤÑĮ
    0.15
     safe
    0.15
    AreaView
    0.15
    Act Density 0.070%

    No Known Activations