INDEX
    Explanations

    content policy violations

    New Auto-Interp
    Negative Logits
     Miracle
    -0.08
     Poc
    -0.08
    Dit
    -0.08
     miracle
    -0.08
    scale
    -0.08
    acent
    -0.08
    mir
    -0.08
    irai
    -0.08
     dit
    -0.07
     blush
    -0.07
    POSITIVE LOGITS
    uous
    0.09
     violence
    0.08
     overthrow
    0.08
     Gewalt
    0.08
    വണ
    0.07
     imaginar
    0.07
    .hp
    0.07
    requ
    0.07
     данный
    0.07
     Ange
    0.07
    Act Density 0.007%

    No Known Activations