INDEX
    Explanations

    sexual exploitation and violence

    New Auto-Interp
    Negative Logits
     unethical
    0.68
     inappropriate
    0.61
     questionable
    0.61
     unsustainable
    0.57
     dubious
    0.56
     hasty
    0.55
     unhealthy
    0.54
     improper
    0.53
     misleading
    0.53
     unfair
    0.53
    POSITIVE LOGITS
     hearing
    0.82
     seeing
    0.79
    hearing
    0.77
     Hearing
    0.72
     Seeing
    0.67
    seeing
    0.66
    Hearing
    0.65
     imagining
    0.61
    Seeing
    0.61
     να
    0.61
    Act Density 0.026%

    No Known Activations