INDEX
    Explanations

    language related to controversial or sensitive topics, as well as terms related to legal and ethical issues

    topics related to whistleblowing, legal issues, and social controversies

    New Auto-Interp
    Negative Logits
    utterstock
    -0.55
    ozy
    -0.54
    ipel
    -0.53
    asma
    -0.53
    inis
    -0.51
    bilt
    -0.51
    amiya
    -0.50
    ramid
    -0.49
    outube
    -0.48
    ibaba
    -0.48
    POSITIVE LOGITS
     exists
    0.68
     existed
    0.66
     might
    0.64
     should
    0.64
     cannot
    0.63
     hadn
    0.62
     could
    0.62
     lacks
    0.62
     ought
    0.59
     lacked
    0.57
    Act Density 1.222%

    No Known Activations