INDEX
    Explanations

    phrases related to discussions of ethical and social issues

    New Auto-Interp
    Negative Logits
    angu
    -0.07
    ransition
    -0.07
    mark
    -0.07
    inkel
    -0.06
    izo
    -0.06
    lings
    -0.06
    pty
    -0.06
    iyim
    -0.06
    odynam
    -0.06
    alla
    -0.06
    POSITIVE LOGITS
    âĢĮâĢĮ
    0.07
     actively
    0.07
    yped
    0.06
    aben
    0.06
     root
    0.06
    unfold
    0.06
    .pb
    0.06
     Benn
    0.06
     Ùĥس
    0.06
    عات
    0.06
    Act Density 0.061%

    No Known Activations