INDEX
    Explanations

    terms related to safety, particularly in the context of accidents and regulations

    New Auto-Interp
    Negative Logits
    iros
    -0.16
    égor
    -0.16
    GPC
    -0.15
    Ïĥμ
    -0.15
    stants
    -0.15
    ãĥ³ãĤ¿
    -0.14
    é¥
    -0.14
    adata
    -0.14
    ãĤ¹ãģ®
    -0.14
    okoj
    -0.14
    POSITIVE LOGITS
     safety
    0.22
     Safety
    0.19
    Safety
    0.18
    afety
    0.17
     hazard
    0.15
     Excell
    0.15
     risk
    0.15
     unsafe
    0.14
     rule
    0.14
    481
    0.14
    Act Density 0.306%

    No Known Activations