INDEX
    Explanations

    questions or phrases related to ethical considerations and societal issues, particularly those involving racism and harmful stereotypes.

    New Auto-Interp
    Negative Logits
    難しい
    0.49
     eventuali
    0.47
    結局
    0.45
     liệu
    0.45
    0.45
    ية
    0.43
    0.42
    ließlich
    0.41
     ஏனெனில்
    0.40
    0.40
    POSITIVE LOGITS
     থাকত
    0.50
     olisi
    0.45
    нови
    0.44
     থাকিত
    0.42
     وقلنا
    0.42
    ....
    0.39
    isher
    0.39
     were
    0.38
    就好了
    0.38
    Were
    0.37
    Act Density 0.077%

    No Known Activations