INDEX
    Explanations

    references to controversial subjects, particularly related to language, sex, and violence in media

    New Auto-Interp
    Negative Logits
     cascade
    -0.16
    utin
    -0.15
    anko
    -0.14
     GC
    -0.14
    _gc
    -0.14
     Kun
    -0.14
     Cascade
    -0.13
    cascade
    -0.13
    mel
    -0.13
     afr
    -0.13
    POSITIVE LOGITS
    ãĥ¼ãĥŀ
    0.17
     offensive
    0.15
    zon
    0.15
     Offensive
    0.15
    locker
    0.15
    spb
    0.15
     UnityEditor
    0.15
     CONTENT
    0.14
     content
    0.14
    ÑģÑİ
    0.14
    Act Density 0.309%

    No Known Activations