INDEX
    Explanations

    warnings or alerts signaling potential issues or dangers

    New Auto-Interp
    Negative Logits
    RG
    -0.69
    olved
    -0.67
    entity
    -0.67
    pole
    -0.66
    iga
    -0.65
    ashion
    -0.65
     ÃĹ
    -0.65
     âĸ
    -0.64
    ovo
    -0.64
    ater
    -0.64
    POSITIVE LOGITS
     warnings
    3.97
     warning
    2.28
     Warn
    2.01
    warn
    1.88
     warn
    1.75
     advis
    1.74
    warning
    1.73
     alerts
    1.71
     Warning
    1.71
     warns
    1.63
    Act Density 0.021%

    No Known Activations