INDEX
    Explanations

    statements that emphasize ideological critique or moral positions

    New Auto-Interp
    Negative Logits
    :
    -0.16
     &&
    -0.13
    ,
    -0.13
     Ø£ÙĬضا
    -0.13
     latter
    -0.13
    ÃĹ
    -0.13
     také
    -0.12
     ÙĨÛĮز
    -0.12
    acman
    -0.12
    oe
    -0.12
    POSITIVE LOGITS
     namely
    0.30
     There
    0.25
     there
    0.25
     It
    0.24
     while
    0.24
     If
    0.23
     whereas
    0.23
     Whereas
    0.23
     While
    0.23
     it
    0.23
    Act Density 0.156%

    No Known Activations