INDEX
    Explanations

    input filtering and validation

    New Auto-Interp
    Negative Logits
    safety
    0.46
     safety
    0.44
    Safety
    0.42
     dangereux
    0.42
    prone
    0.41
    dangerous
    0.40
    ിച്ചി
    0.40
     безопасность
    0.40
    SAFETY
    0.39
     susceptible
    0.39
    POSITIVE LOGITS
     purification
    0.73
     purifier
    0.68
     Purification
    0.67
     purifying
    0.65
     Sanit
    0.63
     FILTER
    0.61
     purify
    0.61
     Filter
    0.61
     filtration
    0.60
     filter
    0.59
    Act Density 0.004%

    No Known Activations